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GENERALIZED THURSTONE AND GUTTMAN SCALES 
FOR MEASURING TECHNICAL SKILLS 
IN JOB PERFORMANCE‘ 


DOUGLAS G 


cnt 


Psychological scaling techniques have, for 
the most part, been used in the measurement 
of attitudes and sensory phenomena. There 
has been increasing interest, however, in the 
application of these methods to measurement 
problems associated with industrial psychol- 
ogy. For example, Mosel, Fine, and Boling 
(1960) have recently reported the usefulness 
of scaling for estimating worker requirements. 

The study described here investigated the 
applicability of the Thurstone and Guttman 
scaling techniques for job skill measurement 
purposes in several related jobs. Generalized 
measurement instruments based on these tech- 
niques would allow a more economical means 
of measuring on-the-job performance with a 
minimum different the 
establishment of a common, scaled job skill 


of forms. Moreover, 
hierarchy would provide a kind of common 
base across the related specialties. This base 
would have implications for cross-specialty 
evaluations, job task analysis content, career 
planning, the establishment of training re- 
quirements across specialties, etc., and 
sibly might grouping 
across jobs for various purposes. Thus a com- 
related jobs 


pos- 


give some basis for 


mon taxonomy for describing 
might be provided. 

In a series of studies which had as one pur- 
pose the development of criterion measures 
for the posttraining performance evaluation of 


enlisted personnel in various naval aviation 


' This study Nont 
279(00) between Applied Psychological Services and 
the Office of Naval Research. We indebted 
D. Smith, J. Nagay, G. D. Mayo and P 
Federman for their assistance throughout work 


was performed under Contract 


are to 
Benson, 
the 


5S 


SCHULTZ 


logical Services, Wayne, Penn 
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Technical Behavior 
Check Lists (TBCLs) were developed for four 
ratings (Richlin, & Schultz, 1960; 
Siegel, Richlin, & Federman, 1960). The 
TBCLs were comprehensive, detailed lists of 
the tasks performed by men in each rating 


ratings (job specialties) 


Siegel, 


They were found to possess those character- 
istics customarily thought to be essential in a 
sound criterion measure. However, it was felt 
that application of psychological scaling meth- 
ods might lead to shorter and more convenient 
check lists and, at the same time, add further 
substance to their meaning 

The scaling approaches employed were those 
proposed by Thurstone (1929 
man (1950). Siegel 
demonstrated the scalability, in both the 
Thurstone and Guttman senses, of the skills 
involved in the naval aviation electronics tech- 
nician specialty. Siegel, Schultz, and Benson 
(1960) achieved similar results for the skills 
involved in the Naval specialty of aviation 
machinist’s mate. Although the check lists de- 
veloped in these studies were of value for the 


and by Gutt- 


and Benson (1959) 


posttraining evaluation of technicians within 
a particular rating, it appeared that a short 
scaled check list which would apply to sev- 
eral 
even greater usefulness, 


ratings would have wider significance 
and would also be of 
considerable interest from the standpoint of 
extending the application of scaling tech 
niques. 

The purpose of the present study, there 
was to investigate whether technical 
proficiency criterion measurement instruments 
could be constructed which could be applied 


across several related naval job specialities 


fore, 
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(ratings) and which could be scaled across 
these ratings by both the Thurstone and Gutt- 
man techniques. Achieving this purpose em- 
braced two steps: (a) developing behaviorally 
based items that were general enough to ap- 
ply to the skills included in the several rat- 
ings and yet covered the important duties of 
each rating, and (0) scaling the items over 
the several ratings. 

Electronics was selected as the broad area 
within which the research would focus. The 
following four naval ratings, which were avail- 
able for study, were felt to involve skills of 
various related types within electronics: avia- 
tion electrician’s mate, aviation electronics 
technician, aviation fire control technician, 
TRADEVMAN (Training Devices Man). 


DEVELOPMENT AND ADMINISTRATION OF 
PRELIMINARY TASK LIST 


THE 


The possibility of constructing a general- 
ized technical skill check list that would scale 
rested, first of all, upon evolving an appro- 
priate list of the tasks performed in the sev- 
eral ratings and casting these tasks in a form 
that would have essentially equivalent mean- 
ing for all of the four ratings included. Previ- 
ous Applied Psychological Services’ studies of 
naval technicians in electronically oriented 
specialties provided a source of specific sug- 
gestions. Consultations were also held with 
staff members of the Naval Air Technical 
Training Command. Out of this background 
a list of 28 tasks was prepared. The form of 
the items was to present only the basic func- 
tion in each task, such as “operates” or “cali- 
brates,” without reference to any specific 
equipment. The general directions for the list 
stated that each item was to be interpreted 
as applying to the “equipment which is en- 
compassed by the rating.’’ The list was then 
submitted for comment to 28 instructors at 
the Naval Air Technical Training Command 
who had squadron experience in one of the 
four ratings being studied. In general, the in- 
structors found the list to be complete and 
the terminology acceptable. 

Although a few minor revisions were sug- 
gested by the instructors, all 28 tasks were 
retained in the preliminary task list. In this 
form the respondent was presented with a 
seven-point continuum and asked to indicate 
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where on the continuum each task would fall 
in difficulty for the average striker (a trained 
worker with minimum experience in the job 
specialty). The respondents were provided 
with gummed, prenumbered response labels in 
amounts such that the frequency distribution 
of the numbers on the stickers roughly ap- 
proximated a normal distribution. The pre- 
liminary task list was administered in groups 
to 242 enlisted supervisory personnel in the 
four ratings studied. The supervisors were dis- 
tributed among three pay grades (Chief, 
First Class, and Second Class Petty Officer), 
among 22 squadrons, and across five locations. 


THE THURSTONE SCALES 


Using the response data obtained from the 
administration of the preliminary task list to 
the 242 supervisors, the median and inter- 
quartile range were calculated for each item 
(or task). These provided the scale (S) and 
deviation (Q) values needed for establishing 
a scale according to Thurstone’s method of 
equal appearing intervals. The results are 
plotted in Figure 1. In examining this figure, 
it should be remembered that the rater was 
forced to respond on a seven-point scale and 
to normalize approximately the distribution 
of his responses. The lowest scale value ob- 
tained was 1.52 for Item 7 (Removing) and 
the highest was 6.05 for Item 11 (Trouble 
shooting/isolating malfunction(s) in). While 
this is a satisfactory range of S values, the 
very extreme positions are not represented. 
The Q values are fairly constant over the en- 
tire range of S values, although there is a 
slight suggestion that the Q values are higher 
for the more difficult tasks. 

In order to select a subset of items (tasks) 
which would form a Thurstone equal appear- 
ing interval scale, items were sought which 
would represent all values along the psycho- 
logical “difficulty” continuum, have minimum 
Q values, and sample all technical areas per- 
formed in the ratings involved. 

Since it was hoped that two parallel scales 
could be constructed, two sets of items were 
selected. Because of the single items available 
at the extremes of the S value distribution, it 
was necessary to accept three items—7 (Re- 
moving), 9 (Replacing), and 11 (Trouble 
shooting /isolating malfunction(s) in)—for 
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TABLE 1 


TASKS SELECTED FOR SCALED LISTS 


Scaled List A 
Removing 
Replacing 
Postflight inspecting* 
Periodically inspecting 
Inflight inspecting* 
Performing preventative maintenance 
Instructing others in the inspection of* 
Using appropriate test equipment for determining 

malfunctions in the 

Analyzing standard circuitry in 


Trouble shooting/isolating malfunction(s) in 


Scaled List B 
Removing. 
Replacing 
Employing safety precautions on 
Following block diagrams for 
Knowing relationship of equipment to other related 
Calibrating 
Employing electronic principles involved in main- 
tenance ol 


Trouble shooting/isolating malfunction(s) in 


Note The general directions stated that each iter 
be inter i as applying tot “equipment which is enc« 
passed by the r 
number in the } 

* This task droppec 


both sets, thus introducing common elements 
into any scores based on the two separate 
scales. 

The selected tasks for the two scales (Scaled 
List A and Scaled List B) are indicated by 
the vertical lines in Figure 1 and are listed in 
Table 1. Although List B contains only 8 


items, they are somewhat more evenly spaced 
along the continuum than are the 10 


List A. 


items of 


DEVELOPMENT AND ADMINISTRATION OF 
INDIVIDUAL EVALUATION ForRM 


THI 


In order to establish scalability in the Gutt- 
man sense, the 28 tasks included in the previ- 
ously described preliminary task list were put 
in a form which would allow for the evalua- 
tion of individuals rather than tasks. The di- 
rections for this individual evaluation form 
differed from those of the preliminary task list 
in two respects: (a) they were oriented in 
terms of a specific man whom the rater had 
supervised rather than the average striker, 


and (6) they asked whether the man being 
rated is checked out as being proficient on the 
task (i.e., is he capable of doing the task “on 
his own” without direct supervision) rather 
than how difficult the task is for the typical 
striker. 

The response alternatives available to the 
rater for each task for each man evaluated 
were: 

1. Has worked on task and is checked out 


Has worked on task and is not checked out 
3. Has not worked on task 


This individual form was ad- 
ministered to the same supervisors as the pre- 
liminary task list. Each rater was asked to 
evaluate a technician he had supervised; it 
was suggested that the technician was not 
necessarily to be the best or the poorest man 
he had had under him. A total of 181 tech- 
nicians were evaluated. 


evaluation 


THE GUTTMAN SCALES 

The method of scalogram analysis proposed 
by Green (1956) was employed. This analytic 
method, an extension of Guttman’s technique, 
places emphasis on a single statistic, the in- 
dex of consistency (/), in place of the several 
requirements for scalability proposed by Gutt- 
man. J relates the obtained reproducibility 
(which Green computes from summary sta- 
tistics) to that expected by chance. He sug- 
gests that J should be .50 or greater, if the 
set of items is to be considered a scale in the 
Guttman sense. Green (1956) writes: 


This criterion appears to give roughly comparable re 
sults to the many criteria used heretofore and will be 
helpful to those who desire to create a dichotomy of 


scales vs. nonscales (p. 87) 


However, Green’s selection of a specific value 
of J for the break between scales and non- 
scales is an arbitrary matter. In general, the 
higher the J, the greater the confidence that 
can be placed in the scalability of the item set. 

One problem was whether to attempt to 
scale the items for the technicians in each of 
the four ratings separately or to group the 
data from all four job specialties. Establish- 
ing the scalability of an item set over the 
four ratings separately would not thereby es- 
tablish its scalability over the total group. On 
this point, Guttman (1950) writes: “A uni- 
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verse may not form a scale for the total popu- 
lation, but still form a scale for subgroups of 
that population” (p. 83). 

On the other hand, it seemed reasonable to 
begin with an analysis of the entire sample 
since a finding of scalability at that level 
would lead to the conclusion that the item set 
also scaled within each rating group. In this 
connection Guttman (1950) states: “if a 
scale is obtained for a cross section of the 
population then that same scale pattern neces- 
sarily holds for all major subgroups” (p. 83). 
Therefore, in the present study, the analysis 
first treated the response data from all four 
ratings taken together, with the thought that 
if scalability was not established at that level, 
the analysis would then proceed to various 
combinations of three or two ratings. 

The items or tasks to be tested for Gutt- 
man scalability were those included in Scaled 
List A (10 items) and Scaled List B (8 
items) which had been selected, as described 
above, because they formed Thurstone equal- 
appearing interval scales. Guttman has not 
provided any method for the preliminary se- 
lection and ordering of a set of items. In this 
study, this was accomplished by using the re- 
sults of the Thurstone analysis. 

Since Green’s method requires dichotomous 
scoring, the “not checked out” and “not 
worked on” categories of the evaluation form 
were considered equivalent, as opposed to the 
“checked out” response. 

When the responses to the 10 items of the 
Thurstone Scaled List A were subjected to a 
Guttman analysis, a reproducibility figure of 
90 and an J of .42 were obtained. Since the / 
value did not reach Green’s critical level of 


.50, the next step was to consider dropping 
from the analysis the technicians of one of the 
ratings. Review of the data suggested that the 
responses of the TRADEVMEN differed from the 
responses of the other three ratings more than 
those of the three did from one another. But 


the J obtained without the TRADEVMEN, al- 
though higher, was only .47. 

Since it seemed apparent that the disturb- 
ing influence was in the items rather than in 
the sample, the next step was to drop some 
anomalous items. There is some disagreement 
about the wisdom of this procedure. Green 
(1954), for example, writes: 


Technical Skills 


TABLE 2 


RESULTS OF GUTTMAN SCALABILITY ANALYSIS 


STPCLA STPCLB 
Reproduc ibility 94 
Reproduc ibility expected by 

chance 


Index of consistency 


If a set of items does not scale, the possibility exists 
of rejecting one or two poor items, and then achiev 
ing a scale. Guttman is chary of this procedure, pre 
ferring to say that the universe is not scalable. How- 
ever, it seems possible to have perfectly good items 
with the wrong form for the Guttman scale. To this 
author, the possibility of rejecting items seems to be 
part of any method of 
rentheses added] measurement 


a necessary (attitude) [pa 


(p. 35 


Torgerson (1958, p. 330) takes the same 
point of view as Green 

Tasks 3 (Postflight inspecting), 4 (Inflight 
inspecting), and 28 (Instructing others in the 
inspection of) were eliminated from scaled 
List A and the remaining seven items ana- 
lyzed. The results are shown in Table 2. The 
obtained J of .57 is high enough to conclude 
that the seven items involved form a scale in 
the Guttman sense. These seven items are re 
ferred to as the Scaled Technical Proficiency 
Check List, Form A (STPCL A). 

The results from the Guttman analysis of 
the eight items in the Thurstone Scaled List 
B are also presented in Table 2. The J of .57 
indicates that these items also constitute a 
Guttman scale. These eight items are referred 
to as the Scaled Technical Proficiency Check 
List, Form B (STPCL B). 

The value of 7 for STPCL A is inflated to 
some extent by the fact that, in deciding upon 
which items to drop, the response matrix was 
examined and, therefore, some advantage was 
taken of chance relationships in the data. Al- 
though J for STPCL A should be checked in 
another population sample to determine its 
value more accurately, the fact that STPCL 
B, from which no items eliminated, 
scaled would suggest that the 7 given above 
for List A is not 


were 


a gross overestimation. 


Gardner 
holos 1954 


>Quoted by permission Lindzey 
Handbook of Social P Addison 
Wesley Publishing Company, Reading, Massachusetts 


irom 
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DIscUSSION 

The fact that it was possible in the present 
study to establish scales over four related but 
different naval job specialties has several sig- 
nificant implications. It is apparently possible 
to generalize a function by divorcing it from 
a specific context and still retain its meaning- 
fulness in different situations. This was the 
central problem faced in writing the items or 
task descriptions. That it is possible to scale 
these items means that the job proficiency of 
the technicians in the several ratings involved 
can be evaluated with reference to a hier- 
archy of job tasks and that if they are checked 
out on one task on a scaled list, it can be as- 
sumed that they are proficient on the tasks 
which are ranked below that one on the list. 
Application of the technique would seem to 
be of value in understanding the basic struc- 
ture of jobs and the interrelationships among 
them and to have significance for the develop- 
ment of training programs. It would also seem 
to be of value for describing the work per- 
formed by the men in related jobs, the se- 
quence of technical skill development, and for 
job evaluation. 


SUMMARY AND CONCLUSIONS 


Check lists for use in evaluating task per- 
formance in several related naval job special- 
ties (ratings) were shown to meet the Thur- 
stone and Guttman scalability requirements. 
The Scaled Technical Proficiency Check Lists 
evaluate the status of a technician with ref- 
erence to tasks normally performed by men 
of equivalent pay grade and rating. The lists 
contain only a relatively small number of 
items, so that they are simple and convenient 
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to use. Yet, because the tasks included form 
a scale, the score obtained from them can be 
generalized in meaning to the “universe” of 
tasks of which they are representative. 
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INTERESTS OF ENGINEERS RELATED TO TURNOVER, 
SELECTION, AND MANAGEMENT 


J.B 


H ydro-Electric Power Commi 


It is usually assumed that the extent to 
which a person’s work and work environment 
are compatible with his interests helps deter- 
mine whether or not he stays on the job. The 
Strong Vocational Interest Blank (SVIB) pre- 
dicts occupational stability (Strong, 1955). 
Does it predict job stability? 

If turnover should prove to be predictable, 
it might be reduced either by screening of ap- 
plicants or alternately by changing conditions 
in the organization so as better to meet the 
needs of those who would otherwise leave. 
Which of these two applications to make 
should be decided on the basis of some cri- 
terion of what constitutes a desirable popula- 
tion for the organization 


METHOD 


The interests expressed through the SVIB by newly 
hired engineers who had left the organization after a 
short period of service were compared with those ex- 
pressed by engineers who remained for a longer pe 
riod. The comparison made on the standard 
SVIB then on the 4 and 
finally on certain groups of items which gave promise 
of being Differences found in the last 
comparison were cross-validated. Further steps were 
then taken to determine whether the men who tend 
to leave or those who stay are more like high achieve 
ment stated in the 
reporting of results 


was 


scales, each of items, 


significant 


engineers. The precise steps are 


Subje cts 


Validation Group. From 1953 to 1955, inclusive 
157 engineers were employed and enrolled in a ro- 
tational training program of 18- to 24-months dura 
tion. The majority were year in 
which they were hired. As part of a battery of tests 
in a long term validity study they completed th 
SVIB during their first They took the tests 
on the understanding that the results would not af 
fect decisions regarding themselves but might be 
used with future groups for selection and placement 
if valid The median 
was 24.8 with an interquartile range of 23.4 to 

At the end of March 1956, 30 of these 
left service and they will be referred to as VL 
cutoff 


graduates of the 


month 


relations were discovered 
men had 
The 
127 who stayed on after the date will be re 
ferred to as VS 


BOYD 


Tore nto 


ion of Ontario, 


A second group con 
in the training 
median age is 24.6 


Cross-Validation Group (C) 
sisted of 70 men hired and enrolled 
program in 1956 and 1957. The 
with an interquartile range from 22.9 to 26.4. They 
also completed the SVIB in their first month. They 
divide into CL, the 13 who left within their first 2 
years of service, and CS, the 57 who stayed longer 

High Achievement Group (HA). This consisted of 
99 engineers who had progressed relatively quickly 
to senior status in the They used 
to determine whether those who leave the organiza- 
tion or those alike in interests to 
those who presently fill key posts. In this way they 
formed the basis for judging the appropriateness of 
selection vs. an alternative application of the results 

The selection of HA was based on salary level and 
age, with kinds of work assignment repre 
sented 

Since the salary evaluation plan for engineers was 
revised on the basis of a complete job study in 1956, 
salary level was considered a good measure of the 
value to the h work 
The top one-third of on salary 
level included 343 

Age limits were a criterion for inclusion 
in the group as a means of controlling the rate of 
attainment and furthe gap 
between HA and the junior group with which it was 
Consistent with the other criteria, 
the group chosen to be as young as possible 
The upper age head office tech- 
nical-administrative subdivision and 50 for the other 
two described in the next paragraph, since promotion 
rapid for them. More than 90% of the 


total were 45 or 


organization were 


who stay are more 


various 


engineer’s 


} } 


pased 


organization of eac 
the engineers 
men 

added as 
rmore of reducing the age 
to be compared 
was 
limit was 40 for the 


is not so 
under 

kinds of assignment 
rences in interest pat 


main divergencies should 


Engineers found in different 
might be expected t« 
terns. In view of this the 
be represented in sufficient numbers to test for dif 


show diffe 


ferences in interest. Two factors were considered im 
portant in this connection amount of tech 
nical content in the work and location. On this basis 
three subdivisions were recognized technical-adminis- 
trative, field; head 
and technical, head office. It was considered desirabk 
to have at least and a total 
of about 100 

These specifications resulted in a list of 106 engi 
neers. They were approached pe 


namely, 


technical-administrative, office 


in each subdivision 


sonally with an ex 


planation of the research, a general indication as to 
how they had selected, an explanation of the 
biasing effect which might occur if self-selection wer: 
superimposed, and a cooperation. There 
were 99 who complied by completing the SVIB on 


been 


request tor 
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and Feb- 
In view of this response the biasing 
due to the voluntary nature of the proposition was 
considered negligible 


a self-administered basis during January 
ruary of 1959 


The median age of HA is 36.5 years with an inter- 
quartile range from 34.2 to 41.0. The technical-ad 
ministrative, field subdivision was represented by 27 
men; the technical administrative, head office by 41; 
and the technical, head office by 31 


RESULTS 
Comparison on Occupational Scales 


The SVIB was scored on Strong’s 45 occu- 
pational scales and on Specialization, Occupa- 
tional Level, and Masculinity-Femininity. A 
comparison of mean scores of VS and VL 
yielded 18 differences at p < .05, 11 of them 
being at p < .01 

The combined predictive value of the in- 
terest scales was obtained by calculating the 
discriminant function. It identified correctly 
72% of CL but incorrectly placed 24% of CS 
in CL. 

Since Strong’s scales were designed pri- 
marily to predict stability in various 
cupations, it seemed likely that some other 
treatment of the 400 items might prove more 
predictive of stability within a particular or- 
ganization. 


Oc- 


Item Analysis 


On the basis of item analysis six groups of 
items were developed each of which differenti- 
ated between VS and VL at p< .01.' The 
item groups were called “Mechanical-Tech- 
nical,” ‘‘Competitive-Persuasive,”’ ‘“Manage- 
ment,” “Repetitive Detail,” “Literary and Ar- 
tistic,’ and “Attitude to Irritating Character- 
istics in Others.” These six groups were used 
in cross-validation.’ 


1 Tested by the ratio of the weighted mean differ 
ence to its standard error (Cochrane, 1954, pp. 445 
446). These calculations and the composition of the 
“Management” item group are the work of Ben 
Kurtz, Professional Engineer of the Hydro-Electric 
Power Commission of Ontario 

A fuller which gives 
which proved significant, explains the 
analysis, and the development of the 
has been deposited with the 
tion Institute. Order No ADI 
Auxiliary Publications Project, Photoduplication Serv 
ice, Library of Congress; Washington 25, D. C., re 
mitting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to: Chief, Photo 


the SVIB 
method of item 
six-item groups 
American Documenta 


account scales 


Document 6666 from 
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TABLE 1 


DISTRIBUTION OF DISCRIMINANT FUNCTION VALUES 


BasED ON Four Item Groups 


Cross-validation data 


Nur ibe 
iscriminant unctio 
D I 


Value X 1000 Stayed Left 


90 and higher 20 
10 to +89 28 
110 to 11 

111 and lower 


Totals 


Cross-Validation 


When the six scores were applied to the 
cross-validation group two of them failed to 
distinguish between CS and CL. The discrimi- 
nant function of the remaining four was cal- 
culated and the distribution of these values is 
shown in Table 1. The rate of turnover is 7 
out of 10 at the lowest discriminant function 
range whereas in the other three levels the 
ratio is in the neighborhood of 1 in 4. In view 
of the differentiation at the lower extreme it 
was surprising that the high end of the dis- 
tribution does not discriminate better. It was 
noted that of the five in the top category who 
left, four were civil engineers in the 1956 
class. Moreover, civil engineers account for 
out of the 16 of the 
the first 2 years. The situation is 
Table 2. 

The Turnover rate for civils is over three 
times as great as for others. Among the latter 


1956 class who left 


shown 


rABLE 2 


OF CIVIL AND OTHER ENGINEERS 
RATE OF TURNOVER 


1956 class 


Staved Turnover 


No 


Potals 
No 


Civil Engineers 21 11 
{ll Others 35 29 


Potals 56 1) 


A table of 
which 


Library of Congress 
the four-item 


successfully cross-validated, is also provided 


duplication Service 


item weights for groups were 
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TABLE 3 
DISTRIBUTION OF DISCRIMINANT FUNCTION VALUES 
ExcLupING CiviL ENGINEERS HIRED IN 1956 


(Cross-validation data 


Discriminant Function 


Value X 1000 


+90 and higher 
10 to +89 
110 to —11 
111 and lower 


Potals 


the rate is in the range for succeeding classes 
over the years. It is obvious that some spe- 
cial influence has intervened. It is a matter 
of record that civil engineers in the 1956 class 
had great difficulty in getting placed within 
the organization. A reduction in the level of 
construction activities meant that fewer were 
needed than had been anticipated. This be- 
came apparent soon after they were hired and 
was known to them. It is therefore reasonable 
to suppose that a number of these men left 
because of the restriction of opportunity al- 
though their interests were compatible with 
the work and climate of the organization. 

A truer comparison is probably obtained if 
the 1956 civil engineers are omitted as is 
shown in Table 3. The difference between the 
two groups by the Mann-Whitney U’ test 
(one-tailed) yields p < .01. Thus it is con- 
cluded that the four item groups together 
identify real differences between those who 
left and those who stayed.* 

Some description of the four item groups is 
appropriate here. The 13 items of the Literary 
and Artistic 
ties, 
items which 
appreciative 
ture,” 


group include expressive activi- 
“author” and “artist,’”’ as well as 
denote either the expressive or 
“art,” “litera- 
’ There is a dearth of items in 
the SVIB covering the purely appreciative as- 
pect, understandable enough in a vocational 
instrument. The only included in this 


such as 


aspects such as 


“music.” 


one 
The same trend is apparent in subsequent classes 
though the 2-year period is incomplete. For the 1958 
class (after 18 months) thos« 

of leaving have a turnover rate of 50% 
to 13% for the rest of the 
1959 (alter 


with scores predictive 
as compared 
group. The figures for 


7 months) are 36% and 12% 
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group is the choice between reading a book 
and going to a movie. 

There are 12 items in the Repetitive De- 
tail group. They include clerical detail, pre- 
cision manual work, and similarity of work 
and conditions over a period of time 

The Mechanical-Technical 
up of 20 items which suggest 


group is made 
working with 
tools, applying technical knowledge, solving 
mechanical problems, or invention. 

The Competitive-Persuasive group is the 
longest (42 items) and is somewhat more 
varied. It includes not only a liking for per- 
suasive activities and for being a contestant 
but also for taking chances and for being so 
cially prominent. Six of the items included 
interest in 
management. Because of special concern over 


with this group also suggest an 


this area these six items were included in a 
separate item group along with others denot- 
This is 
item groups. 
The Management item group, as has already 
been between 
those who stayed and those who left. This 
indicates that it is not 
ment in 


ing managing and directing activities. 
the only case of overlap in the | 


shown, did not differentiate 


the management ele- 
the Competitive-Persuasive group 
that accounts for its discriminating power. 
Data showing the performance of each of 
the four item groups separately will be pre 
sented in the next section when a comparison 
is made with HA. 
High Achiei 


ement Enevineers 


To determine whether those who left or 
those who stayed were more like engineers 
who have higher levels in the or 
ganization, the proportions of HA and of the 


cross-validation group (C) falling at various 


achieved 


score levels are compared in Tables 4 through 
» fg 

Table 4 shows that a larger proportion of 
HA have literary and artistic interests 


score) than is the case among C, i.e 


(low 
, among 
The effect of 
The situation 


the group as originally hired 
turnover is to increase the gap 


with repetitive detail as shown in Table 5 is 


In these tables, which summarize a fuller é 

entation deposited with ADI (Document No. 6666 as 
in Footnote 2), the data are grouped into unequal 
intervals to highlight the trends. This 


the statistical tests 


does not affect 
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TABLE 4 
DISLIKE OF LITERARY AND ARTISTIC PURSUITS 
(High achievement engineers compared 


with junior engineers) 


Proportionate Distribution 


Turnover 
pA 


Score Rate / 


+15to +24 ; 8 0.0 
0 to +14 5 8 9.1 
—30 to —1 43 37 28 7 38.5 
Total 1,00 
Number 99 


Note Differences: CS vs 
similar in that HA has a higher frequency in 
the low scores, which in this case indicate a 
dislike of such activities, and turnover in- 
creases the difference. When the scores from 
these two item groups are combined there is 
a difference in distribution of CS and CL 
which yields p < .001 and between HA and 
CS yielding p < .05.° 

To attempt to reduce turnover by screening 
out applicants having these interests does not 
appear to be a good solution. For, while the 
presence of these interests more frequently in 
the HA group does not prove them to be a 
critical requirement, nevertheless it does pro- 
duce a presumption in the face of which it 


TABLE 5 


LIKING FOR REPETITIVE DETAIL 
High achievement engineers compared 


with junior engineers) 
Proportionate Distribution 


score 


+20 to +49 
—~20 to +19 
—60 to —21 29 19 


Potal 1.00 1.00 


Number 99 7 5 


Note.—Difference .p<.10; HA S, p<.20 


In this, and in succeeding cases where probability 


values are given, they are 
ney U test 


based on the Mann-Whit 
Values between CS and CL, having been 
predicted by the validation data, are one-tailed. All 
others are two-tailed 
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TABLE 6 
LIKING FOR MECHANICAL AND TECHNICAL ACTIVITIES 
High achievement engineers compared 
with junior engineers) 


Propor tionate Distribution 


Turnover 


score Rate “ 


+40 to +69 
+10 to +39 
40 to +9 


otal 1.00 
Number 70 


Differences 


. CL, p <.20 


would be risky to deprive the organization of 
such men. Therefore the indicated course of 
action is to continue to hire these engineers 
and to endeavor to retain them through chang- 
ing conditions so as to give more outlet for 
these interests. 

Tables 6 and 7 point to an opposite con- 
clusion though not as definitely. Table 6 shows 
HA more frequently having an interest in me- 
chanical and technical pursuits. The pattern 
of turnover suggests a curvilinear relationship 
but since this does not occur in the validation 
group, the high rate at the upper score level 
must tentatively be regarded as fortuitous. In 
spite of this irregularity the effect of turnover 
on the whole is to decrease slightly the dif- 
ference between HA and CS. 

Table 7 shows HA highly concentrated in 


PABLE 7 


DISLIKI OF COMPETITIVE ACTIVITIES 


High achievement engineers compared 


PERSUASIVI 
with junior engineers) 


Proportionate Distribution 


Turnover 


HA cL 


Rate ‘ 


06 3 16 12.5 
46 | 16.7 
42 16.7 
06 50.0 


1.00 


9Q9 
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the central portion of the range, indicating a 
moderately positive or moderately negative 
attitude toward Competitive-Persuasive ac- 
tivities. By contrast C has higher frequency 
at high scores, indicating a dislike, and a 
lower frequency in the slightly negative scores 
which indicate a mild liking for these activi- 
ties. While it might be desirable to attempt to 
readjust this balance in future hiring, there 
may also be a case for continuing to include 
a fairly high proportion of those disliking 
Competitive-Persuasive activities in view of 
their low turnover. The most obvious con- 
clusion from the table, however, is that there 
is little justification for hiring in the high 
turnover group, with scores below — 50, since, 
in addition to being poor risks for continued 
service, they do not often appear at the high 
achievement level. This rules these interests 
out as a critical requirement for high achieve- 
ment of an engineer in this organization. 

Though the pointed out in 
Tables 6 and 7 are separately nonsignificant, 
when the scores are combined and the Mann- 
Whitney U test applied, the difference be- 
tween CS and CL yields p < .03. 

For purposes of distinguishing between HA 
and the other groups Tables 6 and 7 do not 
combine profitably because of the form of the 
distribution. This means that it cannot be as- 
serted with confidence that turnover decreases 
the difference between HA and C. However, 
it is reasonably certain that it does not in- 
crease it. Thus the reason for objecting to 
screening seen in Tables 4 does not 
appear here, and the interests represented in 
Tables 6 and 7 might be combined in screen- 
ing new applicants with some confidence 

In these results the HA been 
treated as a whole. Comparisons were made 


differences 


and § 


group has 


between the three subgroups based on loca- 
tion and degree of technical content, but no 
significant differences in interests were found. 

The specific findings apply, of course, only 
to the organization in which the data were 
obtained. The method and instruments, how- 
ever, might be applied elsewhere 


DISCUSSION 


The scales of the SVIB which were found 
to differentiate in the validation group may 
be related to factors derived by Strong (1943, 
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p. 419). The Musician, Mathematician, Physi- 
cist, Dentist, Engineer scales, associated in 
this study with longer tenure, were all found 
by him to have positive loadings on his Fac- 
tor I, which he was inclined to call “Science,” 
following Thurstone. The remaining positive 
differences which were found in this study 
namely, Printer, Math-Science Teacher, Car- 
penter, Aviator, Policeman, Farmer—corre- 
spond to scales which have positive loadings 
on Strong’s Factor III. The scales which in 
this study were found negative, in the sense 
of being associated with short tenure, were 
Lawyer, Advertiser, Real Estate Salesman, 
Sales Manager, Life Insurance Salesman, 
President, Occupational Level. With the ex- 
ception of the last two these all have negative 
loadings on Strong’s Factor III. This is the 
factor which Strong wondered whether to call 
“Things vs. People” or “Language.” It ap- 
pears then that interest in science and in the 
concrete as compared with interest in people 
or the expression of ideas is associated with 
longer tenure by engineers in this particular 
public utility. 

The item groups in this study can also be 
compared to the results of factor analysis 
The Mechanical-Technical item group is prob- 
ably loaded with Strong’s concrete factor. 
More recently Cottle (1950) isolated a fac- 
tor he called “detail interest’’ which appears 
to be close to the repetitive detail of this 
study. However, these studies all used item 
groups which could not be expected to be 
homogeneous, Thurstone and Strong using oc- 
cupational scales and Cottle using even groups 
of occupations. Furthermore Cottle omitted 
the whole of Strong’s Group IV which includes 
all the mechanical occupations of the SVIB 

\ more systematic exploration of interests 
was made by Guilford, Christensen, Bond, and 
Sutton (1954), who constructed a total of 100 


relatively homogeneous 10-item interest ques- 


tionnaires. From among their 28 factors some 
parallels can be drawn with identifiable ele 
ments in the item groups of this study. These 
parallels are drawn purely on the basis of de- 
scriptive similarity and without seeing the 
actual items used by Guilford et al. They are 
in Table 8. From this comparison it 
appears that the Mechanical-Technical item 
group is the most unitary and the others quite 


shown 
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TABLE 8 

DESCRIPTIVE SIMILARITY OF ITEM GROUPS AND IN 
TEREST FACTORS OF GUILFORD, CHRISTENSEN 

BOND, AND SUTTON 


Item Groups Interest Factors 


Literary and Artistic 
appreciation esthetic appreciation 


esthetic expression 


expression 
cultural interest 
Repetitive Detail 
clerical clerical 
manual precision precision 
unchanging pattern variet\ 
Mechanical-Technical mechanical 
Competitive-Persuasive 


competitivene Ss cultural « onformity includes 
competition, status 
social initiative (includes 


competition 


persuasiveness business interest 


} 


ambition (includes per 


suasion 
adventure vs. security 
expressiveness vs. restraint 
includes risk taking 
includes 


social prominence need for attention 


status, exhi 


rec ognition, 


bition) 


mixed factorially. To obtain factorially pure 
measures involves much more than a rear- 
rangement of the SVIB items, or even the ad- 
dition of items to round out its coverage of 
interests. The factor analysis of interests sug- 
gests that most items themselves contain con- 
siderable mixture of factors. 

The comparison made in this study be- 
tween junior and senior engineers differing in 
median age by 12 years is considered justi- 
fied in view of the evidence for the stability 
of interests after college graduation (Strong, 
1955). 

A word should be said about probability 
levels used in this study. In some comparisons 
limits of .03 and .05 have been the basis for 
conclusions. 

The justification for this is that in a prac- 
tical situation, as Burd (1959, p. 49) has 


said, the consequence of Type II error may 
have to be taken into account. Where a man- 


agement decision must be made, implicitly or 
explicitly, between two alternate courses of 
action is surely such a case. 
It would be desirable to have a second 
group of senior engineers to verify the con- 
clusions based on comparison with them. 
However, in the particular organization such 
a group will not appear again for another 10 
years or so, since all who fitted the specifica- 
tions of age and level of attainment were in- 
cluded. They constitute then all of the most 
recently arisen formal leaders amongst engi- 
neers and in this immediate sense they are a 
universe rather than a sample. They are, of 
course, a temporal cross section of a universe 
made up of successive leadership groups. 

Of more serious concern than sampling is 
the assumption that the present leadership 
group provides the best model for the future. 
A good case can be made for the adequacy 
for the past of the leadership of the particular 
organization. In order to choose a better cri- 
terion for the future it would be necessary to 
foresee future developments and to under- 
stand what new demands these would place 
upon the leadership. Thus the method of this 
study, if uncritically used, might result in a 
perpetuation of a current pattern inappropri- 
ate for the future. However, the greater the 
knowledge of current leadership character- 
istics the more possible it is to vary the pat- 
tern deliberately in the direction of foresee- 
able needs. 

A great been cited 
(Argyris, 1957) to show that organizations 


deal of evidence has 
are not necessarily adaptive, either in rela- 
tion to their overall objectives, or in relation 
to the needs of their individual members. 
Studies such as the present one, particularly 
if they explore a broad range of character- 
istics and determine the extent to which they 
are critical, could be used to guide the man- 
agement of an organization in creating condi- 
tions which will encourage the retention and 
development of people whose individual needs 
are likely to be fulfilled in advancing the or- 
ganization’s purposes. 


SUMMARY 


Using the Strong Vocational Interest Blank, 
four groups of interest items were found to 





Turnover, 
distinguish engineers who left the service of 
a particular electrical utility within the first 
2 years from engineers who stayed longer. 
Comparing the interests of engineers who had 
assumed senior responsibility in relatively 
short time confirmed the suspicion that some 
of the men who were leaving might be those 
who were similar in interest to the leaders 
Thus, those likely to leave the organization 
could be into two identifiable 
groups. One group it is considered relatively 
safe to screen out at the time of application. 


separated 


The other group should be encouraged to stay 


by efforts to change conditions in the organi 
zation so as to provide better satisfaction for 
their interests. It is suggested that, as well as 
selecting suitable people, an organization may 
need to adapt itself so as to satisfy the needs 


of the kind of people it requires 


Selection, 


and Management 
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A COMPARISON OF INDIVIDUALS VERSUS GROUPS 
IN JUDGING PERSONALITY ' 


VICTOR B 


CLINE ann JAMES M 


RICHARDS, Jr 


University of Utah 


As a practical necessity men are continually 
required to subjectively judge, assess, and 
evaluate their associates. Frequently in the 
military or in industry this is a prerequisite 
in initial employment, promotion, etc. There 
have been various approaches to the quantifi- 
cation of subjective judgments of which per- 
haps the most common have been rating pro- 
cedures. Since this type of judgment and the 
decisions or action which result 
therefrom have so many far reaching implica- 
tions, any research which might further con- 
tribute to our knowledge in this area should 
be of considerable intrinsic importance. The 
purpose of the present study was to (a) de- 
termine whether individuals or groups are 
more likely to be accurate in making social 
judgments (i.e., “predictions” of the behavior 
and personality of other individuals), and (0) 
at the same time compare different types of 
group judgment. These judgments were made 
on instruments similar to two different kinds 
of rating scales commonly used in applied 
settings. 


courses of 


The rationale of this experiment grew out 
of the recent survey of studies comparing 
group performance and individual perform- 
ance made by Lorge, Fox, Davitz, and Bren- 
ner (1958). The general conclusion of this 
survey was that a group, on almost any task, 
will perform better than a typical individual, 
but not necessarily better than a superior in- 
dividual on the task in question. This finding 
is true whether the “group performance’”’ is 
made by a genuine group or is merely a sta- 
tistical combination of several independent in- 
dividual performances. An unresolved ques- 
tion is the degree to which these findings can 
be attributed to a reduction in the variability 
of the group performance. 

The trend of the studies cited in this sur- 


1 This supported under Contract Nonr 
1288(04), Group Psychology Branch, Office of Naval 
Research. 


research 


vey suggested the hypothesis to be tested in 
this experiment. This hypothesis is: 

The accuracy of predictions (about the be- 
havior of other persons) made by a group of 
persons arriving at a consensus prediction 
through group discussion will be significantly 
greater than the average accuracy of the pre- 
dictions made by the individuals composing 
the group. The average accuracy of the pre- 
dictions made by the individuals composing 
the group will also be significantly /ess than 
the accuracy of an “artificial group” (com- 
posed of pooled independent judgments for 
each item) and also /ess than the accuracy of 
prediction of the best individual among the 
individuals composing the group. 

A secondary question relates to the pres- 
ence or absence of a consistent pattern of su- 
periority in accuracy among predictions made 
by best individual judges, consensus groups, 
and artificial groups composed of pooled in- 
dependent judgments for each item. 


METHOD 
Measure AY 


The subjects were 186 students, both male and fe 
male, in the introductory psychology classes at the 
University of Utah in the fall of 1959. The procedure 
involved the presentation of six filmed interviews or 
“standard others.” These were photographed in sound 
and color, and were conducted by an actor, a mem 
ber of the university theatre staff, who asked a fairly 
standard series of questions 
over interviews) probing the following areas: per 
sonal values, personality strengths and weaknesses 
reaction to the interview, hobbies and activities, self 
conception, and temper 


(to insure equivalence 


After a filmed interview had been shown the pro 
jector would be stopped and the subject-judges r 
quired to fill out paper-pencil judging instruments 
Following this another interview would be shown 
and so forth. Details of the development anc 
tion of these films, the experimental procedures in 
volved, and underlying methodological and 
theoretical considerations have been published elsé 
where (Cline & Richards, 1958, 1960a) 

In this study, two prediction instruments were 
used. The first of these was the Adjective Check List 


se lec 


certain 
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Individual versus Group Personality Judgments 


ACL), which required the subject to determine terviewees to individual items, and 
which of a pair of adjectives the interviewee had the degree to which judges correctly 
checked as being descriptive of himself. A sample viewees in terms of their overall d 
item is osity.” It, therefore, is the best measure of the kind 
14. — - (a) resourceful of accuracy that is the main concern in most institu- 
(b) cheerful tional rating situations. Interpersonal Accuracy, like 
Stereotype Accuracy, has two independent param¢ 


There » 20 such pairs for each of the six films 
ters, a correlation term ss l term yf Fisher 


making a total of 120. The score on the ACL was 
the number correct. Thus the ACL is similar to a 
forced-choice rating procedure 

The second instrument used was the Belief-Values 
Inventory (BVI). On this instrument the subject 
was required to determine (predict) how the inter sponding actual values on ir 
viewee had responded to a Likert-typx ile dealing 
with religious beliefs. During the course of the intet 
view, the person in the film had been asked direct 
questions in this area. A sample item is 


z, and a variance term, thus rmitting independent 
evaluation of accuracy and variability. The 


} 


tion score is computed by 


between each judge’s predict 


ing to Fisher's z 


Variance 
items and 


I feel quite sure God does not exist Proce dure 


) Strongly ag - 
I “~ : The 186 
into 62 three 


at the time 


2) Agree 
Neither ag 
Disagree 


Strongly disagree 


groups cor 


other in 
Thus the BVI is comparable to a graphic rating pro 


cedure 


There were 12 such items for each film or inter 
view. Several different scores based on a recent modi- 
ion by Cline and Richards (19601 

lytic procedure suggested by Cronbach (1955 items oF 
computed from judges’ responses to this in n back to, or 
using a program developed for the IBM 65 ments 
puter. The first of these was a total scor V n is The “art 
based on the average of th juared discrepancies pooled 

ising the one- to five-point l n pred derived 
responses by each ju 1 members 


r 
LOT 


actual responses of each interviewee. T S al or judgment was 
score, and in order to make these scores ble jority vot 
to other scores used in this study, th Vr ing their 


conver! to accuracy score througn ; standar it was ca 


ilues predict 


score transformation, s¢ } meat } \ I 


and standard deviation eq 
The second two BVI scores are comp that this a1 


vhat Cronbach 1955) as illed ‘ reoty chological 


This measures the d w! of the 
th 


le g 


viewee 


J idge predicts how 
whole responds to the judging 
volves the degree to which the mean items (av 
eraged across interviewees) predict y each judge Y 
correspond to actual iten The two scores portant note that 
in this study ar ’ correlation between irtificial group procedur 
judge’s predicted iten ns and obtained item item by item predi 
ins, converted to a Fisher’s z, and the ri bers that were averag 
ance of each judge’s predicted means. Cror curacy scores 
demonstrated these two scores to be t r The “best judg 
ters in Stereotype Accuracy when tl riter basis of his 
held constant, and they permit independent ults of this stud 
tions of the effect of grouping on accuracy that this selection 
variability of prediction in this study basis, thus maximizing 
The last two scores on the BVI are measures of dition by capitalizi ; 
Interpersonal Accuracy. This represents the degree to be impossible for a best judge 
which judges accurately predict the responses of in obtain a higher 
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rABLE 1 


ANS AND STANDARD DEVIATIONS OF JUDGMENT SCORES 


Average of Three \rtificial Group 
Individuals Person Derived by Pooling 
Composing Best Group rhree Independent 
the Group Judge Consensus Judgments 


Adjective Che 
X 7.27 101.66 
og 3.91 
Belief-Values Inventory 
Fotal 
X 


Stereotvpe Accuracy Variance 


Xx 


o 


Interpersonal Accuracy 
X 
gd 

Interpe rsonal Accuracy Variance 
X 


o 


would, in fact, probably score somewhat lower, since RESULTS 

some error would be involved in any advance selet 

tion. The best judges were selected independently for The mean and standard deviations for each 
the ACL and the BVI and therefore were not neces judgment procedure on each judgment score 
sarily the same person on the two different instru- are presented in Table 1. In Table 1. all scores 
ments. On the BVI, however, the best judges, se- 
lected on the basis of total score, were also used as : a Pe 

best judges in making the comparisons involving the '5 based on error score, in Table 1 this judg- 
other scores derived from this instrument ment score is transformed to a standard score 


are accuracy scores. Since total score on BVI 


rABLE 2 


RESULTS OF OVERALL F TESTS FOR JUDGMENT SCORES 


ween- Variance Within-Variance 
Tie 


Judgment Score df=3) df=244 


Adjective Check List 
lotal $51.82 
Belief-Values Invent 
Inventory Total 1423.02 
Stereotype Accuracy 
Stereotype Accuracy Variance 
Interpersonal Accura 


Interpersonal Accuracy Variance 
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TABLE 3 


TESTS FOR SIGNIFICANCE OF DIFFERENCE BETWEEN INDIVIDUAL MEANS FoR EACH JUDGMENT SCORE 


\verage of 
Individual 
vs. Best 
Judgment Score Judge 
Adjective Check List 
Total 
Belief-Values Inventory 
Total 
Stereotype Accuracy 2 
Stereotype Accuracy Variance 
Interpersonel Accuracy 2 


Interpersonal Accuracy Variance 


Note Entries in this table represent the absolut 


* p< 5 
p<.0 


> <.01 


distribution with mean = 50 and standard de- 
viation = 10. 

As a first step in the statistical analysis 
of these data, overall F tests were calculated 
for each of the judgment scores separately. 
The results of this analysis are presented in 
Table 2. No test for homogeneity of variance 
was made before calculation of these F tests. 
This procedure was followed because the re- 
cent work of Boneau (1960) strongly suggests 
that F is not significantly affected by hetero- 
geneity of variance if the sample sizes are 
identical and relatively large, i.e., 20. Both of 
these conditions hold in the present study. It 
is also known that available tests for homo- 
geneity of variance are affected too much by 
other variables than that involved in the null 
hypothesis to justify their use prior to an 
analysis of variance (Box, 1953). 

Since all of the F tests in Table 2 are sig- 
nificant at or beyond the .01 level of confi- 
dence, a test for significance of difference be- 
tween individual means was made. This test 
was made using the multiple range test (Li, 
1957, p. 238), which is the most appropriate 
procedure known to the experimenters for 
making “post-mortem” type comparisons be- 
tween individual means after an overall F 
test has been made. Briefly, the multiple 
range test involves computing a value which 

_represents how large the difference between 
two means must be in order to be significant 
at a stated level, and then comparing the ob- 


Average of | Average of 
Individual 
vs. Group |vs. Artificial) vs. Group |vs 
Consensus 


Group 


Individual |Best Judge} Best Judge | Consensus 


Artificial) vs. Arti 


Group Consensus Group ficial Group 


SO 


4.45** 
16* 
Qo** 
01 
00 


tained difference to this value. Results of this 
analysis are summarized in Table 3 


DISCUSSION 


On each of the four accuracy measures, the 
best judge and both group judgments are sig- 
nificantly superior to the average of the indi- 
viduals composing the group. Thus, the ma- 
jor hypothesis of this experiment is confirmed. 
There is no consistent pattern of significant 
differences among the first three procedures 
mentioned above. As would be expected, on 
the two scores representing the amount of 
variability in predictions, the artificial group 
mean tends to be lower than the means of the 
other three procedures. This tendency is sig 
nificant, however, only for the Interpersonal 
Accuracy variance score. It is somewhat sur- 
prising to find that the artificial group (01 
pooled independent judgments of items) is 
superior to the best judge on the ACL. The 
interpretation of this finding seems to be that 
if both other judges disagree with the best 
judge, they are more likely to be right than is 
the best judge. If, on the other hand, only 
one of the other judges disagrees with the best 
judge, he is more likely to be wrong than is 
the best judge. 

This study clearly implies that satisfactory 
ratings are least likely to be obtained from a 
single unselected individual. In exploring fur- 
ther implications of these results for an op- 


erational rating set up, several other consid 
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erations enter in. The first of these is that 
typically the best judge would be difficult to 
select on an a priori basis, and (because of 
selection error) best judges selected a priori 
would probably score lower than the best 
judges used in this study. Since each of the 
group procedures produces results roughly 
equivalent to the best judge selected on an 
after-the-fact basis, an extensive (and expen- 
sive) effort to identify best judges and use 
them as raters would appear to be unneces- 
sary. 

The second consideration involved in apply- 
ing these results is that by far the most time 
in this experiment was consumed in arriving 
at consensus judgments through group discus- 
sion, a finding which one would certainly ex- 
pect to generalize to other situations. Since 
the artificial group (or pooled independent 
judgments of items) procedure produced re- 
sults as good as or better than the results 
produced by the consensus judgment, and re- 
quired much less time, it would appear to be 
most appropriate when accuracy and time are 
both considered. Thus, the best procedure for 
using ratings in many applied situations 
would be to obtain several independent rat- 
ings from different raters for each ratee, and 
then combine these ratings statistically into 
a single rating. It should be noted, however, 
that the superiority of the artificial group in 
terms of time required (and therefore ex- 
pense) might disappear if only a single sum- 
mary rating were required rather than the 
many relatively specific judgments required 
by the experimental procedure used in this 
study. 

A limitation to these conclusions is the fact 
that each rater in this experiment was basing 
his ratings on the same or identical informa- 
tion (i.e., seeing the same movies of the in- 
terviews). If different raters are basing their 
ratings on different information, some other 
procedure involving the sharing of this infor- 
mation might be superior. 

In addition to the practical implications 
outlined above, these results present, in the 
opinion of the authors, at least two more basic 
additions to previous psychological research. 
The first of these is the demonstration through 
both the Stereotype Accuracy correlation term 


Victor B. Cline and James M. Richards, Jr. 


and the Interpersonal Accuracy correlation 
term of the BVI that accuracy is increased 
through grouping independent of a reduction 
in variability (see Table 1). Unlike the other 
results of this experiment, this would not 
necessarily be expected on the basis of previ- 
ous studies comparing group and individual 
performance, although it certainly is consist- 
ent with previous studies. The second major 
addition is related to the current controversy 
in the “interpersonal perception” literature 
over the relative merits of various different 
types of accuracy scores (Cronbach, 1955). 
In the current study the total score on the 
ACL, the total score on the BVI, and the 
Stereotype Accuracy and Interpersonal Ac- 
curacy correlation terms all gave consistent 
results and, more important, results which 
make sense in terms of previous research com- 
paring group and individual performance. 
This would lead one to hope that the inter- 
pretations of different types of accuracy scores 
have more in common than previous investi- 
gators have thought. 


SUMMARY 


The purpose of the research reported here 
was to compare accuracy of “interpersonal 
perception” or judging ability of individuals 
versus groups. This was accomplished by re- 
quiring the subjects (or judges) to predict 
the responses of six persons (seen serially in 
sound color movies of interview situations) 
to questionnaires dealing with values and self- 
concepts. In the present experiment, these 
movies were viewed by 186 individuals, who 
were also divided into 62 groups composed of 
three persons each. The procedure was for the 
three subjects composing the group to first 
individually and independently “predict” how 
the person interviewed in the movie had re- 
sponded to the questionnaires, and then later 
through group discussion to arrive at a con- 
sensus prediction (without referring back to 
their earlier predictions). A comparison was 
made of the accuracy of (a) the average of 
the total accuracy scores of the independent 
predictions made by each of the three persons 
composing the group, (4) the group consensus 
predictions, (c) the accuracy of an “artificial 
group” derived through a statistical combina- 
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tion of the independent item by item predic- 
tions of these same three persons, and (d) the 
accuracy of the “best judge” from each group. 

The results indicated that the average ac- 
curacy of the individuals is significantly in- 
ferior to any of the other three procedures, 
but that there is no consistent pattern of 
superiority in prediction among these pro- 
cedures. These results suggest that in applied 
settings accurate ratings are least likely to be 
obtained from unselected individuals. How- 
ever, in terms of time, money, and procedural 
difficulties the artificial group procedure or, 
in other words, the pooling of several inde- 
pendent judges’ ratings (by items) appeared 
to be the most satisfactory procedure. 
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DESIRABLE ATTRIBUTES OF WORK: 


FOUR LEVELS OF MANAGEMENT DESCRIBE THEIR JOB 
ENVIRONMENTS 


HJALMAR ROSEN 


University of Illinois 


Although there has been much valuable im- 
pressionistic material presented in the quasi- 
professional literature, particularly with re- 
spect to top echelon management (Argyris, 
1954; Smith, 1955; Warner & Abegglen, 
1955; Whyte, 1954) one is provided with 
little in the way of firsthand reports. The 
purpose of this study, then, is to study how 
various managerial echelons describe their job 
environments with respect to conditions of 
work they deem important. Do differences 
exist among the various classes of manage- 
ment personnel within the hierarchy? 

In an earlier analysis (Rosen & Weaver, 
1960) it was pointed out that there was little 
to distinguish among four classes of man- 
agerial personnel studied with regard to what 
they considered to be important conditions of 
work. Utilizing the same ‘“‘desired’’ conditions, 
to what extent do members of the four man- 
agerial classes perceive those conditions to 
characterize their environment? From casual 
observation of the industrial scene, one might 


assume that job conditions would vary con- 


siderably among as diverse managerial classes 
as first line supervision, time and motion en- 
gineers, general production foremen, and the 
vice president in charge of production. Cer- 
tainly the physical environments differ con- 
siderably, but in terms of commonly desired 
job characteristics, would the differences be 
as great? 
METHOD 
Sam ple 


The subjects of this study made 
managerial personnel (excluding the 


up the complete 
plant manager) 
of a moderately sized, heavy equipment manufactur 
ing concern in a rather small midwestern, urban 
center. The “managers” were divided into four cat- 
egories in terms of organizational charts, status, and 
job duties. Seven men fell into the top manager cat- 
egory, 36 into a middle category, 65 
into a staff specialist category, and 46 into a first line 
supervision category. Top management personnel in 
cluded such job titles as production manager, sales 


management 


Middle management included such 
positions as accounting supervisory and general fore- 
men. Staff included all nonsupervisory, 
technical personnel such as time study men, methods 
And 
as the class title implies, in 
cluded all departmental foremen. The staff specialists 
were considered members of management, and in the 
words of the plant manager “are the reservoir upon 
which we draw for higher managerial posts.” 


manager, etc 
specialists 


engineers, personnel specialists, accountants, etc 
first line supervision, 


Method 


Twenty work within 
four major areas (Rosen & Weaver, 1960) were pre 
sented to the subjects Table 1). They were 
asked to use seven response categories to indicate the 
which they perceived these conditions to 
within their job environments. The 
“Always” through 
“spelled out’ in 


-four desirable conditions of 


(see 


extent to 
exist response 


“Never” 
behavioral 


categories ranged 
and in 
terms.' 

Four basic 
First, means and standard 
for each of the four 


from 
addition were 
data were undertaken 
deviations of item 
were com 
puted. Second, sign tests were applied to all possible 
managerial group pairings in terms of the 24 
means. Third, ¢ 
items on all 


analyses of the 
each 
managerial classes 


item 
for each of the 24 
managerial 


tests were run 
possible group pairings 
Finally, Pearson r correlations between all possible 
pairings of groups across all 24 descriptive conditions 


lefinitions 

“From your point of 
your job You 
when this was not the case.” 
lime” 


The following responses and response 
“Always” 


exists In 


were provided 

always 
think of any instance 
1) “Large Share of 
view, this exists in 


view, this cannot 
“From your point of 
your job a large share of the time 
Although you might be 
you would be hard pressed to do so.” (2) 
Often Than Not” 
exists in 


able to think of exceptions, 
“More 
“From your point of view, this 
your job more You can 
quite easily think of times when it did not.” (3) 
“About as Often as Not”: “From your point of view, 
this exists in your job about as often a 
Less Often Than Not” 
this exists in your job le 


often than not 


ynot.” (4) 
“From your point of view, 

often than not. You can 
(5) “Small 
“From your point of view, this 
mall share of the time. Al 
though you might be able to think of times when it 
did occur, you would be hard pressed to do so.” (6) 


quite easily think of times when it does.” 
Share of the Time” 
exists in your job only a 


“From your point of view, this never exists in your 
job. You cannot think of any instances when it did.” 
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rABLE 1 RESULTS AND DISCUSSION 


[T'WENTY-Four CONDITIONS OF WoRK ps ome , : 
Reviewing Table 2, in general it appears 


that the environments of the four levels of 
management studied are relatively rich in de- 
1. Having the opportunity to talk over problems — sired conditions of work. More specifically, all 

with my superior levels stated that in area of Relations with 
2. Knowing whose or Superiors that: superiors were willing to talk 
3. Working under a super over problems with them (Item 1); they 

wringiatranae knew whose orders to follow (Item 2); supe- 


Usep As Stimuli ITeEMs 


Area 1. Relations with Superiors 


Having superiors who wi : I a . . ; 
a i riors recognized problems involved in the 
\ en ‘ af 


take over ge 


Working under superi ah eee = ea work of their subordinates (Item 9): and the 
tevin of nests superior would help out a subordinate who 
was in need (Item 10). There was no general 
agreement in terms of high incidence of o¢ 

currence in Area 2, Relations with Company, 

although all levels with the exception of staff 

specialists indicated that they perceived the 

plant to be operated efficiently (Item 14) 

with clear cut, long range objectives (Item 

15). Area 3, Relations with Peers had two 

areas of reported high incidence for all levels: 

cooperative (Item 18) and helpful (Item 20) 

fellow managers. Area 4, Decision Making 

and Implementation, had no items indicated 

having high incidence for all levels. It should 

be pointed out that this area contained the 

only cases where any management level indi- 

cated conditions of work being less rather 

than more characteristic of their job environ- 

ments; specifically with regard sharing in 


~ 


policy making decisions (Item 21) and hav- 


l 
ing meetings where everyone can have his say 
(Item 24). 


16. Having the other managers at my level recogniz These results suggest that the work envi- 
a ronments of the four levels of management 
17. Having knowledge of ; a : 
: ei studied, although differing along a physical 

much as it m ° ‘ , ms . 
H , continuum sharply, provided for the social 

iving mutual cooper } ong inager ° ' “¢ 

ny level and psychological needs that were manifest 
Working with fellow managers who recognize th Moreover, there is some justification for 
problems involved in my work pointing up the area of decision making and 


Working with fellow managers wh ll help 1 implementation as the one in which desirable 
ut when I get into a jam conditions of work were least prevalent 
Se ey ee Te A sign test analysis of the data indicated 
that on 21 of the 24 items the means for top 
wasihite- agi management exceeded those of staff special- 
naking decisions ists (.01 level) and on 18 items they exceeded 
the means of first line supervision (.05 level) 
23. Having sufficient authorit x : fen ; 
aia those of staff specialists in 24 cases (.01 


Middle management’s item means exceeded 


24. Having management meeting where ever level) and those of first line supervision in 23 


can have his say (.01 level). Top and middle management 
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TABLE 2 


MEANS AND STANDARD DEVIATIONS RE EXTENT TO WHICH WorK CONDITIONS 
CHARACTERIZE JoB ENVIRONMENTS OF Four LEVELS OF MANAGEMENT 


Top Mgt Middle Mgt Staff Spec First Line 


Item Mean SD Mean SD Mean SD Mean SD 


\rea 1 
1 1,000 
2 420 


Relations with Superiors 
926 


502 


1.011 
.687 
800 
1.054 
1.092 
1.313 
1.330 
811 
806 


1.036 

803 
1.378 
1.416 
1.410 
1.534 
1.422 
1.166 

876 
1.805 


367 
1.649 
917 
1.394 
917 


3 850 
} 1.850 
5 1.420 
2.280 
1.420 
1.000 
1.000 
1.000 


756 
756 


1.773 


mt et AD et ND ee 


1.788 


Relations with Company 
710 
906 
648 
000 


1.280 


1.570 


1.551 
1.376 
1.610 
964 
1.146 


2.830 
1.000 
1.000 


756 
Relations with Peers 
1.000 53 1.390 
z ) ; 1.392 
) 714 
1.112 


1.092 


1.850 1 


1 


R50 
1.420 
710 


( 
1.061 
4.58 


Decision Making + In 
1.710 
1.710 


1.570 


1.824 
1.629 
1.365 
1.842 


mM NR NH bo 
ae 


3.000 


ae 


were undifferentiated in this analysis, 14 and 
10, respectively, and staff specialists and first 
line supervision were undifferentiated, 9 and 


ress. It appears that advancement from low 
echelons to high, at least within this organiza- 
tion, would bring about greater frequency of 
desired conditions of work, but not advance- 
ment within either the low or the high classifi- 
cations. Apparently, until one achieves a rela- 
tively high supervisory status within this 
organization, major increases in desired job 
conditions do not tend to occur. 

From Table 3, it is apparent that the means 
of top- and middle-management personnel did 
not differ from one another significantly on 


14, respectively. Essentially, then, in terms of 
perceived degree of occurrence across all job 
conditions, there seem to be, literally, two sig- 
nificantly different groups made up of sub- 
classes that were undifferentiated. 

Regardless of the preliminary 
such analyses, the findings are in keeping 
with the current belief that the higher one 
goes in the management hierarchy, the greater 


nature of 


are the’ rewards of the environment. It should 
be noted, however, that the findings suggest a 
dichotomization rather than an orderly prog- 


any item. Top management differed signifi- 
cantly from first line supervision on two items 
and from staff specialists in one. This minimal 
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frequency of significantly greater occurrence 
of desirable job conditions in this case may be 
largely a statistical artifact, however, due to 
the large standard error resulting from the V 
of seven in the top management level. Middle 
management’s item means were significantly 
greater than those of the staff specialists in 11 
cases including all conditions relating to Area 
4, Decision Making and Implementation. 
Their means were significantly greater than 
first line supervision in seven cases, scattered 
among all areas. First line supervision’s item 
means exceed those of staff specialists in five 
none of the cases fell within the area 
of relationships with supervision, however. 
These results suggest that the lower man- 
agement have the least rewarding 
job environments, but perhaps more impor- 


cases 


echelons 


rABLE 3 
DIFFERENCES BETWEEN MANAGERIAL 
Wuicu Work 


» AS CHARACTERIZING 


SIGNIFICANT 
TERMS OF THE EXTENT Ti 
ConDITIONS WERE DeEscri! 

THEIR 


LEVELS IN 
RESPECTIVE JOB ENVIRONMENT AS 


326* 
103** 
710** 
104" 
3 595** 
?.116* 
625* 
3.295** 
i ie 
4 O88** 
2.221* 
4.026** 
3.116** 


2.656* 
3223" 


2 691* 


TABLE 4 


PEARSON r CORRELATIONS ACROSS TWENTY-FouR JOB 
CONDITION ITEM MEANS AMONG FouR LEVELS OF 
MANAGEMENT 


Cl asses 


Middle 
Stall Spe 
Top Managment vs. First Line 
Middle Management vs. Staff Speciali 
Middle Management vs. First Lir 
Staff Specialists vs. First Line 


lop Management vs Management 


cialists 


Top Management vs 


tant, that the staff specialists rather than their 
organizational inferiors, first line supervisors, 
report less incidence of favorable conditions 
of work than any other level. The widely dis- 
cussed problem of the relative isolation of the 
technician and foreman from the mainstream 


of management and their organizational supe- 


riors may account in part for such findings. 
The relatively richer job environment of first 
line supervision relative to that of staff spe- 
accounted for in 
terms of line staff differences in function. 
The job environments of the man- 
agerial levels, as denoted by the 24 job condi- 
tions, were found to be highly correlated with 
one another (see Table 4). The intercorrela- 
middle 
cialists, and first line supervision’s reported 
environments are of 


cialists perhaps may be 


four 


tions among management, staff spe- 
a magnitude that equals 
the reliability of the 


ment’s reported environment 


Top manage- 
although 


scale 
sig- 
nificantly comparable to the respective envi- 
ronments of the other levels, tended to be of 
somewhat smaller magnitude. In _ general, 
however, in spite of significant differences in 
reported 
earlier, the profiles are essentially the same. 


frequency of occurrence reported 

These results suggest that the concept of 
“management climate” may 
ity, ie., that, in effect, an industrial organiza- 
tion provides a characteristic work atmos 
phere within which the management personnel 
play the roles. Herzberg, 


have some valid- 


Mausner, Peterson, 
& Capwell (1957) suggested that managerial 
commonality in 


supervisory behavior is a 


function of the subordinate attempting to 


The odd-even 


scription of job environment scale was 


reliability orrected) of the 
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model his behavior after his superior. Insofar 
as the conditions probed in this study dealt 
largely with interpersonal relations with peers 
and superiors, perhaps each level, modeling 
after the level above and providing a model 
in turn for those below, created a character- 
istic pattern of work conditions that pervaded 
all levels. 
SUMMARY 


Four management levels in a moderately 
sized industrial firm were asked to describe 
the extent to which 24 desirable conditions of 
work characterized their respective job envi- 
ronments to determine whether or not, in 
spite of differences within the physical envi- 
ronments of the four levels, conditions relat- 
ing to (a) relation with immediate superiors, 
(6) relationship with company as an insti- 
tution, (c) relationship with organizational 
peers, and (d) role in decision making and 
implementation, differentiated among the 
levels. 

All management levels studied, in general, 
reported a high incidence of desired condi- 
tions of work. There was some evidence that 
relatively speaking Area 4, Decision Making 
and Implementation, was the area where de- 
sirable conditions of work were reported to be 
least prevalent. 

There was some evidence suggesting that 
richness of the job environments in terms of 


desirable conditions of work was related posi- 


tively to increased status in the hierarchy. 
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This trend, however, indicated a dichotomous 
rather than a four-fold progression with the 
two upper echelons indicating significantly 
greater overall occurrence than the two lower 
levels. 

Staff specialists compared to the other 
levels indicated the least incidence of desir- 
able conditions of work—particularly and 
understandably in Area 4, Decision Making 
and Implementation. 

The profiles re occurrence of desirable con- 
ditions of work for all levels were significantly 
parallel in spite of significant differences in 
magnitudes for given items between levels. 
The top management level profile showed the 
greatest deviancy from other levels, however. 
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DRIVER JUDGMENTS OF RELATIVE CAR VELOCITIES 


PAUL L. OLSON, ROBERT 


General Motor 

The operator of an automobile is almost 
continually faced with a problem of esti- 
mating the velocities of preceding cars rela- 
tive to his own speed. The great number of 
high-speed rear end collisions indicates that 
these judgments may not be made accurately. 
This seemingly widely held opinion has not 
been verified experimentally however. Thus, 
as a preliminary step to finding ways of 
reducing this type of accident it seemed ad- 
visable to assess the ability of people to make 
these relative velocity judgments. 

The purpose of this investigation was two- 
fold. First, 
can determine whether the gap between their 
hold- 


Second, to determine 


to learn how accurately drivers 


own and a preceding car was opening 
ing constant, or closing 
how well drivers can discriminate among dif- 
ferent rates of change of this gap 


METHOD 


variables selected 


the 


Two independent were 
vestigation. These 
of the 
following vehicles, 
vehicles at 


ind 
le ad 
between 
ects’ 


were direction rate 


th the 


and 


the 


between 
the 


irt of 


change distance 


and two 


the sti 


spacing 
the sub observation 
period 

Two 
used 


cars were 1 different 


ot 


ones 


the ex 


required an¢ 
the 


compensate 


on each several days during 
ment. To the 
noted in the speedometers, that of 


recalibrated to a 


peri 


for liscrepancies usually 


the lead car was 


standard follower ¢ Two 


communication 


in ar 


way between vehicles was mair 


tained by 

The 
between the two cars 
of the following car constant at 4 
the of the 
in increments of 10 mph 
speed 


$0 


wave radio 
the 


means of portable short 


direction and rate of change of gap 


was controlled by holding the 
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RESULTS 


The matrix presented in Table 1 contrasts 
subjects’ judgments with actual conditions 
The row and column headings are presented 
in terms of speed discrepancies. For example, 
a judgment of —30 implies that the subject 
thought that the gap and the 
speed discrepancy was 30 mph. Mean judg- 
ments for each condition are listed along the 


was (¢ losing 


right edge of the table. Frequencies listed 
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TABLE 1 


DISTRIBUTION OF JUDGMENTS MADE UNDER EACH EXPERIMENTAL CONDITION 


Subjects’ Judgments of Speed Difference in 


Actual 
MPH Speed 
Difference 


Separation 

Distance 

in Miles - 20 —10 
+30 10 

20 


Miles Per Hour 


Average 

Speed 

0 +10 +20 +30 Judgment 
{-26.4 
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in the outlined diagonal are correct judg- 
ments. 


There was a total of 154 judgments, of 
which 62 were correct. Of the 92 which were 


incorrect, 62 were conservative in that the 
subjects made judgments which were under- 
estimations of the lead car’s speed. It is par- 
ticularly interesting to note that there were 
only three reversal errors (errors where the 
subject said that the gap was opening when 
it was really closing, or vice versa). All these 
errors occurred under the same experimental 
conditions (0.2-mile separation and +20- 
mph speed differential) and amounted to 
underestimations of the lead car’s speed. 
There were seven decisions which were 
potentially dangerous, where the subject 
judged the gap to be constant or opening 
when in reality it closing. All these 
errors were made when the speed differential 
was —10 mph. It should be noted that even 
under the .10-mile separation condition a 
speed discrepancy of the magnitude of —10 
would allow more than a half minute for the 
individual to change his mind before it was 
too late. The subjects in this study were 
allowed only 7 seconds observation. 


was 


The analysis of variance showed differences 
significant at the .01 level among the mean 
judgments made under the various speed 
conditions at both separations. The results of 
the Duncan tests indicated that only 8 of 
the 91 pairs of means did not differ signifi- 
cantly at the .05 level. 

An information analysis was run on the 
data presented in Table 1 to determine the 
amount of information in bits transmitted 
under each of the separation conditions. 
Under ideal circumstances, where the subject 
unerringly guessed the direction of change as 
well as the exact speed discrepancy, 2.81 bits 
would have been transmitted by seven stimuli 
such as were employed in this study. The 
calculations showed that 1.05 and 1.38 bits 
transmitted at the .20- and .10-mile 
separations, respectively 

It is evident from Table 1 that subjects 
tended to underestimate the speed of the 
car. The overall mean difference was 

4.6 mph. The mean estimates at each condi- 
tion are plotted in Figure 1. The best fit line 
drawn through the data illustrates the ac- 
curacy with which one could predict a mean 
response knowing the actual conditions. The 


were 


lead 
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Mean Subject Judgment 
Fic. 1. Relationship betweep actual speed differences 
and subjects’ estimates of the speed differences 


correlation between the mean estimates and 
the actual conditions was very high, r = .99. 

Consideration of the overall means, how- 
ever, tends to mask much interesting data. 
For instance, it should be noted that the 
subjects were much more accurate in making 
their judgments when the gap between the 
cars was closing 
The mean error 


than when it was opening. 
of estimate was only 1.5 


mph when the gap was closing as contrasted 
with 8.6 mph when it was opening. Subjects’ 


estimates were also more accurate at the .1| 
mile interval as compared with the .20-mile 
distance. The mean error at the .10-mile sepa- 
ration was 0.9 mph, while that at the .20- 
mile separation was 6.8 mph. 

Figure 2 shows a plot of mean judgments 
made of the various speed conditions at the 
different separations. It can be seen that the 
estimates tended to be quite accurate when 
the gap was closing in either instance but, 
especially at the +20 and +30 differential, 
distances 


made at the longer 


were much less accurate 


the estimates 
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differ 
speed differences 
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speed 
ences and subjects’ 
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DISCUSSION 


It seems clear from the data that people 
are capable of rather accurate discriminations 
in making judgments regarding velocity dif- 
ferentials. Most of the errors were made in 
a direction which would, if anything, pre- 
cipitate more caution on the part of the 
operator than might have been deemed neces- 
sary if the judgment had been correct. 

The information analysis revealed that the 
subjects were receiving only enough informa- 
tion to reduce uncertainty by about one-half. 
This would imply that the subjects were able 
to reject three to four possibilities in each 
judgment and make a random 
choice among the others. From the data it is 
apparent that the subjects seldom erred in 
judging whether the gap was opening or 
closing though there was some confusion with 


situation 


the constant situation. Thus the problem ap- 
peared to be such that the subjects could 
determine the direction of change without too 
much trouble, but they were much less certain 
when estimating the precise speed differential. 

The most accurate judgments were made 
at the closer distances and under conditions 
This is not sur- 
comforting. It is 
that there 
were very few dangerous decisions and that 
all of 
closing at the minimum rate 


where the gap was closing 
prising certainly, but it is 
particularly interesting to note 


these were made when the gap was 

In general then, people tend to do rather 
well in making the type of judgments called 
for in this study. It appear 
there little that 


dangerous actions would be frequently based 


further would 


there is reason to believe 
on the information supplied by these types 


of judgments. 


SUMMARY AND CONCLUSIONS 


Eleven subjects were evaluated for their 
ability to detect the direction and rate of 
change of the interval separating the car in 
which they were riding from a preceding car. 
This interval was set at one of two magni- 
tudes and could remain constant or open or 
close at one of three rates 

The following conclusions are based on the 


data collected: 


1. In the range of speed differences tested, 
people tend to be quite accurate in determin- 
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ing whether the distance between their car and 
a preceding one is increasing or decreasing 

2. People exhibit a better than chance 
ability to discriminate between opening and 
closing rates at least as fine as 10 mph. 

3. The accuracy with which judgments 
such as these can be made increases as the 
distance between the vehicles decreases. 

4. Judgments are made more accurately 
when the gap is closing than when it is 
opening. 


A. Wachsler, and H. J. Bauer 


5. In the range of speed differences studied, 
subjects tended to underestimate the relative 
speed differential between their car and the 
one in front of it. 
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A COMPARISON OF TWO METHODS OF TRAINING 
IN A COMPLEX TASK BY MEANS 
OF TASK SIMULATION’ 


J. S. KIDD 


Ohio State University 


One of the most obvious trends in military 
and industrial operations has been the gradual 
upgrading of skill requirements for workers 
as basic jobs have become more complex. The 
change in the nature of the jobs has created 
the need for coordinated changes in worker 
training. Training philosophy, methods, and 
equipment are all susceptible to review and 
modification in the face of these increasingly 
imperative requirements for skilled operators 
and technicians. 

The present was undertaken 
within a context provided by one of the more 
promising developments in the training field; 
namely, system simulation. Simulation itself 
is not a new concept. The Link trainer of 
pre-World War II vintage is a _ relatively 
modern application. However, recent progress 
in instrumenting the simulation process, in 
combination with the changing nature of the 
tasks to be learned, has greatly enhanced both 
the current usefulness and future potential of 
simulation techniques (Chapman, Kennedy, 
Newell, & Biel, 1959; 1957). The 


experiment 


Goodwin, 


long history of simulation and its present 
rapid development, unfortunately, have not 
been accompanied by very extensive research 
support. In the particular area of pilot train- 


ing, where most of the research to date has 
been accomplished, there are still more un- 
answered questions than data (Muckler, 
Nygaard, O'Kelly, & Williams, 1959). 
Among the host of detailed problems yet 
to be fully understood is the influence of task 
load on training efficiency in a simulated 
setting. Task load was selected for investiga- 
tion in the present experiment since its influ- 


1 This research was carried out in the Laboratory 
of Aviation Psychology and was supported by the 
United States Air Force Contract No. AF 
616)-3612, monitored the Aerospace Medical 
Division. Permission is granted for reproduction, 
translation, publication, use, and disposal in whole 
and in part by or for the United States Government. 
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ence has already been established in quite 
different task settings. Barch (1953), Barch 
and (1954), Green and 
Szafran and Welford (1950) have all found 
that a relatively high level of ‘task difficulty” 
or task load imposed relatively early in the 
learning sequence was facilitative. The tasks 
employed were predominantly of the type 
dependent psychomotor skills. In the 
present instance the basic hypothesis is ex- 
tended to a predominantly discrimination- 
decision making task provided by the simula- 
tion radar air-traffic 
operation. 


Lewis (1955), 


on 


of a control center 


METHOD 
Apparatus, Task, and Subjects 


The general task environment was provided by the 
simulation within the laboratory 
control center 
the specially 

Control 


Cowan, 


alr tratth 
The simulation was implemented by 
developed OSU Electronic Air Traffic 
Simulator (Hixson, Harter, Warren, & 
1954). This device, which is built around 
an analog computer, is capable of generating up to 
30 aircraft targets and presenting 
to the radar controller via a cathode ray tube 
display. Direct manipulation of the “aircraft” is 
accomplished by college students trained to faithfully 
carry out pilot functions. In addition to the 
display of aircraft position available to the 
troller, he is in direct auditory communication 
the under his 
radio channels 

The task requires S to 


1 a radar 


them realistically 


V 


visual 
con 
with 
“pilots” jurisdiction through simulated 
act radar controller 
He is responsible for the guidance of aircraft within 
a specified zone of responsibility. The normal ap- 
proach route is in length. The controller 
must manipulate the position, heading, airspeed 
altitude the aircraft under his 
coming in to land. He mus 
approach is made 


as a 


mi 


and 
ol while 


that 


direction 
t to it 
expeditiously 


they 
the 


ire 


see 
landing and 
safely 

The 16 novice controllers who participated in this 
study were selected from a total population of twice 
that number undergraduate students at Ohio 
State University. These students were initially 
employed in earlier studies as pilots and thus were 
familiar with the simulation 
was based the proficiency 


ol 


3 


Selection 
during 


operations 


on shown pre 
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sessions. The best 16 Ss chosen as 


training 
controllers. 


were 


Description of Experimental Variable 


The results of prior experiments in this series have 
led to the conclusion that the number of aircraft 
under control (rather than, for example, the entry 
rate parameter per se) is the most valid single 
determinant of input load (Kidd & Kinkade, 
1958). It was possible through the use of special 
procedures to maintain the number of aircraft under 
control (target density) at any desired level for any 
required duration. Thus, input load level in these 
terms could be maintained at a constant level 
throughout a given problem 

A comparison was made in this study between 
training under consistently high input load condi- 
tions (Experimental Group I) vs. training under 
conditions of gradually increasing input (Ex- 
perimental Group II) 


load 


The exact level of input load assigned to Experi- 
mental Group II was determined on the basis of 
data obtained during a set of special pre-experimental 
trials. Three novice controllers, having a tested ability 
level at the median value of the total sample used 
in this study, were each given a series of 10 special 
problems. The target density was varied from 
problem to problem within the range from four to 
six aircraft under control. The order of the problems 
was random for each controller. The results were 
pooled to provide a base relationship between target 
density and average control time, one of the funda- 
mental criteria of performance, and between training 
trials and average control time. By arbitrarily select- 
ing a single value on the average control time 
dimension, it was possible to derive an equation 
which would give a fair approximation of the trade- 
off function between the level of training and target 
density. Thus, performance could be theoretically 
held constant across training trials by simultaneously 
manipulating density. The graduated input load 
schedule was determined on this basis. The resultant 
progression in order was 4, 4, 4, 5, 5, 5, 5, 6, 6, 
6 aircraft under control for the series of 10 problems 
in this experimental condition 

The other experimental condition (Experimental 
Group I) required that the level of aircraft under 
control be held at six aircraft for all trials. A graphic 
comparison of the two experimental conditions is 
presented in Figure 1. The three novice controllers 
utilized in the preliminary evaluation of the vari- 
ables were not included in the study proper 


Initial Training and Matching Trials 


In order to provide data for selection purposes 
and matching of participants, a program of initial 
training was undertaken prior to the study proper. 
The initial training period involved a total of 8 hr. 
Seven hr. were spent in classroom training as a 
group, and 1 hr. was devoted to individual training. 
In detailed breakdown, the preliminary training con- 
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difficulty at 
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sisted of 2 hr. of lectures on the principles of air 
traffic control given by E, 4 hr. in practice of com- 
ponent skills, 1 hr. of observation in the simulated 
control center, and 1 hr. of operational practice as 
a controller. 

For component skill training, stimulus materials 
were recorded on a strip film for group presentation, 
and conditions of both written and verbal responses 
were employed. The exercises in aircraft performance 
characteristics were in the form of written problems 
in which the trainees were given the values of certain 
variables and were asked to estimate the nature of 
the required command 

The observation sessions were set up so that the 
trainees could observe the performance of an experi- 
enced controller during a typical exercise. The opera 
tional practice was carried out under low traffic load 
conditions with maximum feedback of knowledge of 
performance. The results of the operational practice 
trials were utilized for matching purposes in the 
experiment proper 


Statistical Design and Procedure 


The nature of the input load variable employed 
in this study required that independent sets of con 
troller trainees be used in each experimental con- 
dition. In order to maximize statistical power, the 
matched-pair technique was used. That is, on the 
basis of the 1-hr. practice trial, the novice controllers 
were ranked according to initial proficiency. Trainees 
were paired on the basis of initial performance and 
one of each pair assigned randomly to one of the 
two experimental groups, the other S$ going to the 
remaining group. 

Experimental sessions were scheduled in the ABBA 
sequence. Order effects were minimized by this bal- 
anced schedule. Each session consisted of five prob- 
lems or exercises. Each exercise was of 30-min. dura- 
tion. Each controller-trainee participated in 10 
exercises during the course of two successive sessions 
separated by a 24-hr. interval 


Measures of Performance 
In order to achieve a detailed coverage of system 


performance, multiple criteria were employed. The 
most reliable of these measures has been mean per- 
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which was calculated by determining 
the minimum theoretical flight time for each aircraft 
processed, subtracting this figure from the observed 
flight time, and dividing by minimum flight time. 
This delay ratio score was then averaged for all 
aircraft in a problem 
each problem. 

Fuel consumption was a 
criteria. It was computed separately for each air 
craft on the basis of hypothetical fuel consumption 
curves which took into account three 
craft type, airspeed, and altitude 

Two noncontinuous measures 
Failures to achieve proper landing set-ups were 
tallied for one such measure and the number of 
separation errors per problem was the other. A sepa- 
ration error was defined as the failure to maintain 
30-sec. flight time minimum 
craft. This standard translated to as much as 6 
lateral miles or 6,000-8,000 ft. vertical separation 
during early stages of the landing approach 


centage delay 


giving a mean delay score for 


second continuous 


factors: alr 


were employed 


distance between air 


RESULTS 


The first consideration is the over-all learn- 
ing achieved by the novice controllers during 
the training sessions. The main continuous 
performance measure, flight delay, is pre- 
sented in graphic form in Figure 2. The 
progressive reduction in excess flight time for 
Experimental Group I indicates the effect of 
training under constant load conditions. The 
curve for Experimental Group II is broken 
into three sections to indicate the 
between levels of input load 


break 
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A statistical comparison between experi- 
mental conditions was made on the basis of 
performance on Trial 10 which is regarded as 
the test trial. Since the participants were 
matched on the basis of a preliminary test 
trial, the ¢ test for matched groups and the 
Walsh test were appropriate. The results of 
these tests for all criteria employed are pre- 
sented in Table 1. The superiority of Group I 
over Group II is supported at the .01 prob- 
ability level on the criterion of excess flight 
time. The probability fuel 
consumption is .025. The differences between 
the two groups on landing set-up errors and 
on separation errors per aircraft processed are 
not significant. However, total aircraft proc- 
essed per 30-min. trial is significant at the 


level for excess 


rABLE 1 


SUMMARY OF STATISTICAL TESTS OF 
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EXPERIMENTAI 


Group I AND EXPERIMENTAL Group II 
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Mean excess fuel cor 
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Me al lear ding set 
errors per aircraf 
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processes per 30 min 
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055 level. On no measure of performance did 
there occur a reversal of the major trend. 

A somewhat different approach to the 
analysis of these results leads to a considera- 
tion of the mechanics involved in the differ- 
ence noted between the two groups. A 
potentially important consideration is the 


cumulative number of aircraft processed over 


4 


the total 10 trials. Figure 3 compares the 
two groups on this index. Group I, in this 
case, starts at a higher level and maintains 
a greater rate of increase throughout the 
series. Figure 4 compares the two groups on 
cumulative gross error frequency. 





——-— Group I (Constant Lood) 
Group II (Graduated Lood) 


Errors 


Cumulative Gross 











Trials 


Fic. 4. Cumulative relative error frequency 
(landing set-up errors plus separation errors). 


gross 


DISCUSSION 

The agreement between the results of the 
present study and others (Barch, 1953; Barch 
& Lewis, 1954; Green, 1955; Szafran & Wel- 
ford, 1950) which have employed a similar 
variable is substantial. This congruence is 
made more significant by the fact that while 
most previous studies employed tasks which 
depended upon the participants’ motor skills, 
the present task was largely perceptual and 
cognitive in nature. This latter characteristic 
is of progressively increasing significance as 
industrial tasks increase in complexity (Gagne 
& Bolles, 1958). 

In spite of the agreement in the data, how- 
ever, there is not at present a well accepted 
explanation for the common outcome. Mecha- 
expectancy, frustration, and 
motivation have been suggested, but none 
has gained ascendency. In the present study 
it is not possible to make a clear differentia- 
tion between the actions of the mechanisms 
that have been proposed so far. However, it 
is possible to emphasize certain aspects of the 
results which may provide some orientation. 

Such an analysis requires a re-examination 
of the fundamental reward and 
punishment on the learning process. Learning 
is generally thought to take place under 
conditions of both reward (for correct re- 
and punishment (for incorrect 
responses ). It may be assumed for the present 
that in the task utilized here, each aircraft 
landed constituted a reward and each sepa- 
ration error and landing set-up error was a 
punishment (unpleasant event). The present 
study provides an accurate portrayal of many 
nonlaboratory learning situations due to the 
fact that both kinds of events, reward and 
punishment, occurred throughout training. 

If Experimental Group I is compared with 
Experimental Group II as is done in Figures 
3 and 4, it is apparent that there were differ- 
ential frequencies of both kinds of events for 
the two groups. Experimental Group I experi- 
enced both more rewards and more punish- 
ments than did Experimental Group II. Now, 
if it is postulated that punishment is dis- 
ruptive during learning and leads to low in- 
centive to continue by frustrating the trainee, 
the prediction would be that the learning of 
Group I would be retarded. However, Group I 


nisms such as 


effects of 


sponses ) 
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turned out to be substantially superior to 
Group II. 

It might be concluded, that it 
is not the kind of reinforcement but the total 
level of reinforcement which is the operative 
factor. Thus, the level of feedback of knowl- 
edge of results is higher for Experimental 
Group I on both counts, knowledge of suc- 
and errors. Group I 
simply experienced a heightened rate of in- 
formation feedback and this eventuated in 
superior performance. 

While the above speculations are compati- 
ble with the data, there are others that would 
be equally so. It is possible, for example, that 
participants in Experimental Group IT learned 
a subset of responses under low input load 
that 
higher input load conditions. 


therefore, 


knowledge of 


cesses 


conditions were inappropriate to the 
Thus, at each 
change in input load, new responses would 
have to be learned under some degree of 
interference from the responses already ac- 
quired. It is likewise plausible to suggest that 
Experimental Group I participants were in- 
doctrinated to a higher level of effort and 
aspiration by the implicit suggestion that it 
was possible to efficiently under 
higher input load conditions. Thus, 
than motivationally detrimental, the 
input load may have been quite compatible 


the 


operate 
rather 
higher 
with the achievement aspirations of 
participants. 

Theoretical mechanisms aside, the practical 
implications of the findings, while limited by 
the conditions of the study to be 
relatively straightforward. Thus, given a rela- 


appeal 


tively complex task wherein most of the com- 
ponent skills have been previously acquired, a 
relatively high level of activity at the outset 
phase of the total 
training program seems to be desirable when 
the anticipated high initial error frequency 
will not involve disastrous consequences. 


of the system-training 


SUMMARY 


Improvement in performance with training 
in a complex task of radar air traffic control 
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was compared under a condition of constant 
high input load during training vs. a condi- 
tion of graduated input load during training. 
Relative input load was defined as the num- 
ber of aircraft under the control of a single 
operator. 

The test performance of Ss trained under 
high input 
superior on several criteria to that of Ss 
trained under the graduated input load 
condition. 

An explanation was proposed in terms of 
the heightened feedback of 
knowledge of performance experienced by the 


constant load was significantly 


frequency of 


high constant input load group 
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DRIVER OPINIONS AND REPORTED PERFORMANCE 
UNDER VARIOUS INTERCHANGE MARKING AND 
NIGHTTIME VISIBILITY CONDITIONS’ 


MARVIN D 


DUNNETTE 


University of Minnesota 


Nighttime driving conditions offer special 
problems of visibility. This is especially true 
at highway intersections. As a driver proceeds 
over any highway system, he continually ar- 
rives at a series of intersection choice points. 
Most drivers know where they want to go, 
but they do not always know exactly how to 
get there. It is, therefore, of obvious impor- 
tance to develop and utilize systems which 
will enhance nighttime visibility and thereby 
provide with optimal 
about the route or routes they may be fol- 


drivers information 
lowing. 

These considerations point up the impor- 
tance of providing adequate markings and 
conditions of visibility at highway intersec- 
tions. Highway systems throughout the coun- 
try have made wide and effective 
illumination and reflectorization to 
plish these aims. A good deal of research 
utilizing direct physical measurement has 
been performed in an effort to assess the 
visibility 


use of 


accom- 


degree of improvement under a 


variety of conditions of illumination. 


In addition to research on 
levels of visibility and their relative effective- 
ness, attention has been given to the relative 
utility of different marking systems in direct- 
ing or guiding driver performance. As men- 
tioned previously, appropriate guidance of 


widespread 


drivers is particularly important at intersec- 
tions; the marking system should be sufficient 
to reduce any potential confusion or error on 
the part of the driver. 

The study reported here was designed to 
discover possible effects of different highway 
nighttime visibility conditions and different 
highway marking systems on driver perform- 
ance. Research was undertaken over a period 
papers which 
Highway 


on January 13, 
Association in St. 


1 This article is a revision of wert 
read before meetings of the 
Board in Washington, D. C 

and the Midwestern Psychological 


Louis, Missouri on April 30, 1960 


Research 
1960. 


of 7 weeks during the summer of 1959. The 
experiments were conducted in the state of 
Minnesota on a cioverleaf interchange formed 
by the intersection of U. S. Highway 61 and 
Minnesota State Highway 36. A variety of 
experimental conditions of varying visibility 
and using varying systems of highway mark- 
utilized and 
studied. All experimental studies were con- 
ducted during night driving 
tween the hours of 9:30—-11:30 P.M. 


ings was driver performance 


conditions be- 


METHOD 


Five conditions were employed during the period 


of the study 


Illuminated condition, du- 
intersection. The 
mercury vapor luminaires were turned on as is usual 
and no special treatment of reflectorization was em 
ployed 

Condition II, the Dark condition, consisted simply 
of turning off the luminaires, still using no reflectori 
zation other than the signs turn 


Condition I, the Fully 
plicated conditions typical for the 


employed to show 
areas 

Condition III 
reflectorized 


utilized a standard application of 
delineation. Under this condition, the 
lights remained off, but reflective treatment was em 
ployed in the form of amber delineators in the loops 
ind legs of the cloverleaf similar to the 
Manual on Uniform Traffic Control 
and Highways of the 


standards 
contained in the 
Device for Street 
Vinnesota 
Condition IV utilized 
reflectorization The 
blue and amber delineators and blue and amber re 
flective pavement paints were used to indicate areas 
of exiting and The entire cloverleatf 
interchange was not treated; only the portions which 


State of 


an experimental method of 


luminaires remained off, and 


merging traffic 


> Figures showing the placement of delineators and 
the nature of the pavement treatment employed in 
Conditions IV and V 
author upon request 

A detailed technical description of the experimental 
reflectorization which was employed is given by Fitz- 
patrick, Joseph T., Integrated reflective 
ment and delineation treatments for night traffic 
guidance, unpublished paper, January 1960. A copy 
obtained from M. D. Dunnette 


may be obtained from the 


sign, pave 


of this paper may be 
upon request 
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Driver Opinions and 
directly visible to a motorist traveling 
north on U. S. 61 or traversing the ramps in the 
southeast quadrant received reflective treatment 

Condition V combined the treatments of full illu 
mination and Experimental Reflectorization. The 
luminaires were turned on. The reflective treatment 
was maintained as in Condition IV 


would be 


The intent of this study was to study driver per 
formance under these various different conditions of 
visibility and highway marking systems. Performance 
was studied by interviewing motorists after they had 
traversed the intersection. Interviewing stations were 
located at two points, A and B. Motorists inter- 
viewed at Station A were those who had just left 
U. S. 61 and were about to enter and proceed in an 
easterly direction on Minnesota 36. Motorists inter 
viewed at Station B were those who had proceeded 
straight through the interchange from south to north 
on Highway U. S. 61 and also those who had just 
entered U. S. 61 from Minnesota 36 via the 
leaf loop in the southeast quadrant 


clover 


Prior publicity via press, radio, and TV referred to 
the fact that a study was to be conducted using var 
ious experimental conditions. None of the publicity 
described details of the conditions nor was any in 
formation supplied which could be helpful to local 
drivers in interpreting the meaning of the 
experiments. 


Various 


Points A and B, they 
were signaled to stop and were asked to answer a 
series of questions requiring about § 
interview schedule was designed to obtain informa 
tion in five major areas 


As motorists approached 


minutes. The 


1. Personal information such as sex, age, familiar 
ity with the interchange, etc 

2. The extent to which the driver did or did not 
experience difficulty in choosing the correct 
route through the interchange 

3. Suggestions, if any, that the 
ior improving the guidance system 
change 

4. Markings used by the driver or 


turn or 


driver might offer 
used in the inter 
found helpful 
by him for recognizing certain critical response zones 
such as areas of exiting and merging traffix 

5. Personal impressions ot 
drivers concerning the 
in Conditions IV and V 


opinions voiced 


reflectorized treatment 


The Sample 


A total of 1137 motorists was interviewed at the 
two stations, A.and B. The interviewed 
ranged between 199 for Condition V and 270 for 
Condition I 


numbers 


A large majority of the motorists interviewed were 
men, comprising 970 of the drivers; only 167 were 
women. Somewhat fewer than half the drivers were 
in the age range 26-40 with the remainder being dis 
tributed equally between the under-25 and over-41 
groups. 

A large 


majority of drivers participating in the 
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study were familiar with the cloverleaf interchange 
Over half said they used the interchange daily. An 
additional 30% reported using the interchange at 
least once a week or oftener. Fewer than one in six 
reported totally unfamiliar with the inter- 
change. Examination of the frequencies of use of the 
intersection by under the different experi 
mental conditions differences. At both 
interviewing stations, chi square tests showed no sig 
nificant relation between familiarity 
change and experimental condition 


being 


drivers 
showed no 


with the inter 


RESULTS 


Only a small minority of respondents said 
they experienced any difficulty making their 
way through the intersection. The numbers 
and percentages of persons saying they had 
some difficulty are shown in Table 1. It may 
be noted that at Station A, the highest inci- 
dence of driver difficulty occurred under Con- 
ditions II and III 
tions, nearly one in eight drivers experienced 


Under these two condi- 
difficulty locating the exit ramp to Minnesota 
36. Under the Fully Illuminated and experi- 
mentally reflectorized conditions (Conditions 
I and IV), practically no one (fewer than 1 in 
50) experienced difficulty 

This finding is important for two reasons: 
(a) lighting is confirmed to be an effective 


way of reducing driver confusion and possible 
error, and (5) the Experimental Reflectoriza- 
tion is shown also to be an effective means of 
reducing driver difficulty in 
interchange. 

At Station B, only 12 drivers (about 1 
experienced any difficulty 


traversing the 


traversing the in- 
terchange. This is an expected result since it 
through an inter- 
section than to locate a particular 
turn off 
Table 2 
who volunteered for improving 
the marking or visibility of the intersection 
in some way. At both stations, fewest sugges- 


is easier to drive straight 
point or 
motorists 


shows the numbers of 


suggestions 


tions for improvement occurred under the 
two conditions employing Experimental Re- 
flectorization. 

It is noteworthy that a substantial decrease 
in suggestions for improvement occurred be- 
tween Condition II and Condition I and that 
a somewhat larger and significant decrease 
occurred between Condition II and Condition 
IV. Apparently the reflectorized treatment is 
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rABLE 1 


AND PERCENTAGES OF 


DIFFICULTY 


Stator 


Had 


Difficult, 


Condition Percentage 


Fully Illuminated 
Darl 


Standard Delineatior 
Experime ntal 
Reflectorization 


ined Illuminatior 


and Refl 


oml 


ectorization 


effective in offering both adequate visibility 
and guidance. 

A study of the actual suggestions made by 
those motorists who offered them gives fur- 
ther meaning to these results. Under Condi- 
tion I, the major suggestion was that more 
signs be placed at the intersection; a few 
motorists also suggested the use of markings 
such on the pavement, markers 
the side of the road, and more vivid 
center stripes. Under Condition II, the major 
complaint the 


arrows 


as 


along 


apparently was caused by 


Drivers WHO 
PRAVERSING 


\ 


REPORTED EXPERIENCING SOME 


THE INTERCHANGI 


Station B 


Had No 
Difficulty 


Had 


Difficulty 


Had No 
Difficulty 


Percentage Percentage Percentage 


99 
1g 


OX 


darkness. Although some motorists still men- 
tioned the need for more signs and markings, 
most simply said “Turn on the lights.” Under 
Condition III, suggestions for improvements 
included most 
the first two 
Condition IV 
shown in the 


of the factors mentioned under 
conditions. under 


number 


Suggestions 
were fewer in (as 
tables), and seemed somewhat 
more specific than those offered under the 
first three conditions. Fewer suggestions were 


offered under the combined conditions of illu- 


mination and reflectorization than under any 


PABLE 2 


NUMBERS AND PERCENTAGES OF DRIVERS 


INTERCHANGE 


Condition 


Fully Illuminated 
Dark 
Standard Delineatior 


Experimental 
Reflectorization 
Combined Illumination 


and Reflectorization 


Wuo Orrt 
VISIBILI1 


I 


Ofte 
Su 


RED SUGGESTIO 


MARKINGS 


NS FOR 


Y AND/OR 


red No 


rvestions 


Offered No 
Suggestions 
Percentage 


Percentage 


Highway 61 
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rABLE 3 


NUMBERS AND PERCENTAGES OI PRAFFI 


Not 


THROUGH 
DRIVERS* COULD OR 


Mi RGING 


THEY 
AREAS OF 


SAYING COULD 


IDENTIFY PRAFFL 


a large 
majority of drivers believed both visibility 
and guidance to be adequate. 

Regardless of the experimental condition, 
the vast 
Minnesota 36 


other condition. This is evidence that 


majority of drivers exiting onto 
from U. S. 61 believed the 
route markings gave them adequate informa- 
tion about where to turn. The percentage of 
drivers saying this ranged from a low of 90% 
under Conditions II and III to a high of 97% 
for Condition IV. Most apparently 
because of their familiarity with the inter- 
change, already knew where to turn. In addi- 


drivers 


tion, however, it appears that the sign indicat- 
turn 
source of guidance for drivers encountering 
the first three conditions. Under the Experi- 
mental Reflectorization, the 
seemed less important, and the delineator and 
pavement 


ing the approaching was a primary 


however sign 
mentioned more 
often. This could be due partly to the “new- 
of the experimental treatment. It is 
possible that the pavement colors stood out 


treatments were 


ness” 
so sharply as to attract driver attention and 
comment to a greater degree than might have 
been the had the 
familiar with the Experimental 


zation. 
Motorists 


been more 
Reflectori- 


case, drivers 


driving through the 
intersection from south to north on U. S. 61 
were nearly unanimous in their belief that the 
through route was sufficiently well marked. 
Another for through motor- 


who were 


significant need 
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ists, however, is to be clearly aware of areas 
of exiting and merging traffic. These 
critical response areas for the motorist and it 


are 


is in and near these areas that improved visi- 
bility and guidance may be most important. 
Data shown in Tables 3 and 4 give informa- 
tion about identification of these areas under 
the various experimental conditions 

It may be noted that areas of merging and 
exiting traffic were recognized by a high ma- 
jority of drivers. The highest degree of recog- 
nition occurred under Conditions IV and,V. 

The interview schedules also requested in- 
formation about the methods used by drivers 
in recognizing areas of merging and exiting 
traffic. Over half the drivers under the last 
two conditions mentioned the colors on the 
pavement and on the delineators as important 
sources of information. Few drivers (just over 
1% ) 
tioned 


under the reflectorized conditions men- 

traffic flow as them 
about merging and exiting areas: the 
about 1( identified 
traffic flow as their major source of informa- 
tion 


giving evidence 
unde: 
first three conditions 
It is evident, therefore, that many driv- 
ers (over half) did associate the Experimental 
Reflectorization treatment with the identifica- 
tion of areas of merging and exiting traffic. It 
is difficult, however, to judge whether or not 
under 


this is of practical importance. Even 


the Dark condition, 94 of drivers success- 


fully identified areas of merging traffic; one 


may well question, therefore, whether the in 


crease in successful identification to 97 for 


rABLE 4 


NUMBERS AND PERCENTAGI 


Drivers SAyING THuey Covi 


AREAS 


IDENTIFY 
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Condition V is of any practical consequence; 
further research is needed on this question. 

As explained previously, the interview 
schedules were designed, in part, to elicit 
opinions and impressions from motorists con- 
cerning the Experimental Reflectorization em- 
ployed in Conditions IV and V. 

A large majority of motorists recognized 
intended relationships among the various 
markings. For example, over a third of the 
motorists noted that the blue of the exit 
ramp matched the blue of the sign indicating 
the location of the exit. It also was common 
for motorists to associate the amber or yellow 
colors of the pavement and delineator treat- 
ment with sLow or CAUTION. There 
nearly unanimous agreement that the reflec- 
torized treatment was helpful in driving. 
Many motorists volunteered comments indi- 
cating a generally favorable attitude toward 
this particular experimental treatment. 


was 


DISCUSSION 


This study was undertaken in order to 
study driver performance and opinions under 
different conditions of night visibility and 
with the use of various highway marking 
systems. Motorists taking part in the study 
were, as a group, highly familiar with the 
interchange chosen for study; and were in a 
position to offer informed opinions concern- 
ing the effects of the several experimental 
conditions employed. 

Since differences in driver opinions and re- 
ported performance were obtained under the 
various conditions, it is likely that drivers are 
aware and concerned about different night 
driving conditions. Opinions obtained from 
drivers in this study suggest that they are 
more confident, have less difficulty, and have 
a better opportunity to do a good job of night 
driving when visibility and guidance are im- 
proved either by illumination, reflectorization, 


Marvin D. Dunnette 


or both. More drivers experienced difficulty in 
traversing the interchange and more drivers 
made suggestions for improvements under the 
Dark and Standard Delineation conditions 
than under the other three experimental con- 
ditions. 

The results of the study also provide clues 
concerning the possible effects on night driv- 
ing performance of the Experimental Reflec- 
torization employed in Conditions IV and V. 
It appears that the reflectorization treatment 
is readily related by the motorist to certain 
night driving needs. For example, 


1. A significantly smaller number of motor- 
ists made suggestions for improvements under 
Condition V—the combined condition of full 
illumination and Experimental Reflectoriza- 
tion—than under any of the other four condi- 
tions. The proportions of motorists making 
suggestions increased progressively for condi- 
tions of Experimental Reflectorization, Fully 
Illuminated, Standard Delineation, and Dark. 

2. Conditions of Fully Illuminated and Ex- 
perimental Reflectorization appeared equally 
effective in reducing the incidence of driver 
difficulty in traversing the intersection. 

3. Over half the drivers under Conditions 
IV and V identified the pavement reflectoriza- 
tion as indicating areas of merging and/or 
exiting traffic. 

4. It was the opinion of the large majority 
of drivers under Conditions IV and V that the 
Experimental Reflectorization was an effec- 
tive and helpful means of providing night 
driving guidance. 


The over-all results of this study suggest 


that reflectorization as well as illumination 
can be regarded as an effective means of re- 
ducing driving problems related to nighttime 
visibility conditions. 
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RESPONSE SET AND THE PREDICTION OF CLERICAL 
JOB PERFORMANCE 


PHILIP H. KRIEDT 


Prudential Insurance Company 
Considerable attention has been given re- 
cently to the question of response set in per- 
sonality measurement. It has been shown that 
much of the variance in a number of common 
self-description questionnaires is interpretable 
in terms of response set rather than item con- 
tent (Barnes, 1956; Edwards, 1957; Hanley, 
1956; Messick & Jackson Research 
has been directed mainly at the identification 
of various kinds of set or 


1958). 
bias, social desir- 
ability, and acquiescence for instance, and at 
techniques for controlling set so that content 
scales can be more meaningful. Although sev- 
eral writers including Cronbach (1950) and 
Edwards (1957) have noted that in some in- 
stances response set scores might themselves 
provide valid measures, this point has gen- 
erally not been emphasized 

In the construction of personality 
concern for response set, if there has been any 
concern at all, has usually resulted in attempts 
to control it. This has been done by the use of 
correction or suppressor keys such as the K 
scale of the MMPI, by the avoidance of items 
that are related to sets as in the 
“subtle” keys developed for the MMPI 
(Wiener, 1951), by use of forced-choice item 
forms, or by special administrative techniques 
such as the “side by side” method recom- 
mended by Voas (1958). Jackson and Mes- 
sick (1958), however, have pointed out the 
need for deliberate attempts to increase re- 


tests 


response 


sponse set as a way for finding valid pre- 
dictors of and Berg (1957) has 
argued provocatively that response set is the 


behavior. 


important factor to measure in personality 
assessment and that item content is of little 
importance. He has used a test of meaningless 
abstract “deviant” re- 
sponse set and has found that it differentiates 
normal and deviant behavioral groups (1955). 


designs to measure 


Support for Berg’s viewpoint is presented 
in this article which reports a study in which 
the Gordon Personal Inventory was given to 


ROBERT I. DAWSON 


Equitable Life Assurance Societ 


a group of clerical workers. The study pro- 
vides the rather ironic finding that the in- 
ventory, a forced-choice test, was successful 
in predicting job performance ratings not by 
controlling response set but rather because it 
permitted response set to affect the scores 


4 DESCRIPTION OF THE SCALI 


In constructing the Personal 
Gordon (1953) wanted to use 
item format, but he was also concerned with 
making the test as acceptable as possible to 
test takers. He therefore selected the tetrad 
forced-choice form in which two equally fav- 
orable phrases are paired with two equally 
unfavorable phrases. The respondent is then 
asked to check one phrase as most like him 
and another as least like him. One is not com- 
pletely forced, consequently, to prefer one of 


Inventory 


a forced-choice 


the two items of equal preference value and, 
as Berkshire Highland have 
pointed out in their review of forced-choice 
rating procedures, this type of item is apt to 
permit considerable response bias 


and (1953) 


The Personal Inventory vields four meas- 
ures which are called: Cautiousness, Original 
Thinking, Personal Relations (trust and con- 
fidence in others), and Vigor. Each trait is 
represented once in each tetrad. Each trait is 
described approximately 1( 
mentary phrases and 10 uncom- 
plimentary phrases. If the respondent marks 
a complimentary phrase as most like him or 
an uncomplimentary phrase as least like him, 
he gets a +1 on that scale. If he marks a 
complimentary phrase as least like him or an 
uncomplimentary phrase as most like him, he 
gets a —1 on that scale. For each tetrad, he 
may score plus on two traits, minus on two 


times by compli- 
times by 


traits, or a plus on one scale and a minus on 


one scale. The highest possible score on any 
scale is +20 and the lowest possible score 

20. A fifth score, called the 
obtained by adding the four trait scores alge- 


Total score, is 
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braically. Total score may vary from +40 to 

40. Total score, it is important to note, is 
strictly a response set score measuring the 
respondent's willingness or unwillingness to 
check unfavorable checks 
only socially desirable phrases as most like 
him and socially undesirable items as least 
like him his total score is +40. If he checks 
only socially undesirable phrases as most like 
him and socially desirable items as least like 
him his total score is —40. Also, each of the 
four trait scores depends partly on whether or 


responses. If he 


not the respondent is willing to say unfavor- 
able things about himself. If he always says 
favorable things, the trait score must fall be- 
+20. If he checks some unfavor- 
able replies, a trait score may be negative. 


tween O and 


RESULTS WITH OFFICE EMPLOYEES 


As a part of a tryout of several personality 
tests, the Gordon Personal Inventory was 
completed by 41 employees on beginning level 
jobs in an insurance company. The group was 
composed primarily of women with from 1 to 
3 years of service who were told that they 
were taking the test for experimental pur- 
poses and that it would in no way affect their 
job status. The immediate superiors of these 
employees ranked them on several job per- 
formance factors, and an overall performance 
evaluation for each employee was derived 
from these rankings. 

First of all, it is of 
extent to which clerks will check uncompli- 
mentary self-description phrases in a business 


interest to note the 


rABLI 


SCORES OF 


CORRELATIONS I ivi 


AND RATINGS O1 


AMONG 


\ 


Original Personal 


rhinking Relations 


Cautiousness 
Original Thinking 
Personal Relations 


Vigor 
Total score 


Note 


CLERK 
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setting. In this instance, they did so over one- 
fifth of the time. Of the 1,640 
checked by these 41 women, 360 or 22% were 
instances of a complimentary phrase checked 
as least like her or an uncomplimentary 
phrase checked as most like her. No one 
endorsed all the socially desirable responses. 
The highest total score was +38, and the 
lowest 18. The social desirability of all 
this group was the same as for 
Gordon’s experimental group, that is, there 
were no instances in which the majority of the 
group preferred an uncomplimentary phrase. 

Since these clerks frequently endorsed both 
complimentary and uncomplimentary phrases 
each of the four trait keys is related to the 
Total or social desirability score. The inter- 
correlations among the four trait scores and 
the Total score and the correlations of each 
scale with the rating of job performance made 
by the supervisors are shown in Table 1. 

All four trait scores are highly related to 
the Total or social desirability score. Al- 
though all four trait, scores are moderately 
related to the criterion, these validities may 
be due to the social desirability factor in each 
trait score rather than the forced-choice trait 
measurement. The highest validity is for the 
Total or social desirability score. When Total 
score is partialed out of the trait score validi- 
ties, it can be noted in Table 1 that these 
validities shrink considerably and instead of 


responses 


items for 


all four being positive, two are positive and 
two negative. 
It appears that the success of the inventory 


1 


GORDON PERSONAL INVENTORY 


\L Jos PERFORMANCE 


41) 


Partial r 
with Rating 
Total Score Held 

Constant 


10 
16 
10 
11 
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for predicting clerical performance in this 
situation is due mainly to the social desirabil- 
ity response set which affects all the scores. 
These results that one should not 
interpret validities obtained with trait scores 
of the Gordon Personal Inventory as neces- 
sarily being trait validities. Trait score valid- 
ities may be due to response set. Gordon gives 
no evidence of validity in his manual for the 
inventory, but for the profile, he reports data 
for two validity studies. One study is with 22 
deputy sheriffs and all four trait scores cor- 
relate positively (correlations of .21, .50, .25, 
and .34) with buddy ratings. No validity for 
total score is given, but 


suggest 


since all four trait 
validities are positive, it is clear that the total 
score would have moderately high validity. It 
is certain that the validities for the trait 
scores are due partly and perhaps mainly to 
Another 
study is reported comparing ratings of 30 
department 


the social desirability response set. 


store salespeople and_ profile 
scores. Salespeople with superior ratings have 
significantly higher scores than those with 
low ratings on all four trait scores. Again, it 


is likely that all of these differences are due 


in large part to the social desirability response 


set since total score shows a similar difference 
between the groups. The results with insur- 
ance clerks reported here may not be unusual. 
Perhaps much of the predictive strength of 
the Gordon tests lies, paradoxically, in their 
ability to 
sponse set. 

Another 
ability 


measure a social desirability re- 


way of holding the social desir- 
determining the 
validity of trait scores is to score the trait 


factor constant in 
keys in a different manner. Trait scores which 
are virtually independent of the social desir- 
ability response set can be obtained in the 
following way. Divide the tetrads into pairs 
of favorable and unfavorable phrases. If one 
favorable phrases is 
for that trait 


phrase in a pair of 
marked ‘‘most like,” 
and 1 for the trait regardless of 
whether it has been marked like” or 
not marked at all. Similarly, if one phrase in 
a pair of favorable phrases is marked “least 
1 for that trait and +1 for the 
other trait regardless of whether it is marked 
like” or not marked at all. 


assign +1 
other 
“least 


like” assign 


“most Pairs of 


unfavorable phrases would be scored in the 
same manner but in the opposite direction. If 
both phrases in a couplet are unchecked they 
would be scored zero. The algebraic total of 
trait scores obtained in this way is necessarily 
zero and a response set score cannot be ob- 
tained from such trait scores. 

The writers rescored the 41 inventory tests 
in this manner and obtained correlations for 
the four trait scores vs. job performance rat- 
ings that identical to the trait 
score validities with Total partialed out re- 
ported in Table 1. Validity coefficients ob- 
tained for the trait were: 
Cautiousness .08, Original Thinking 04 
09, Vigor .04 


were almost 


revised scores 


Personal Relations 


SUMMARY 


\ comparison of Gordon Personal Inven- 
tory scores and job performance ratings for a 
group of insurance clerks showed that three 
of the four trait also the Total 


score, a SOK ial desirability response set score 


scores and 


had significant positive correlations with the 
job performance criterion. When Total score 
was partialed out of the trait score validities 
however, these validities disappeared Also it 
found that trait scores which are 
free of were obtained by a re- 


was when 
response set 
vised scoring procedure, the four trait scores 
had Results for 


reported by Gordon in his manual for the 


zero validities. two studies 
similar to the 


inventory, suggest that response set may ac 


profile, a forced-choice test 
count for the validities reported 

This study furnishes an illustration of the 
possible value of response set as a predictor 
measure. Response set is not necessarily a 
nuisance factor in personality measurement 
which should be 


Measurement of response 


and therefore something 


eliminated sets 
may prove to be a valuable approach to per- 
sonality assessment. 
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DEVELOPMENT AND VALIDATION OF SYNTHETIC 
DEXTERITY TESTS BASED ON ELEMENTAL 
MOTION ANALYSIS’ 


DONALD W 


Industrial Psychol 


The difficulties involved in choosing appro- 
priate dexterity tests for use in personnel 
selection have been exemplified in numerous 
studies (Candee & Blum, 1937; Ghiselli & 
1955, pp. 218-235; Treat, 1929; 
Viteles, 1932). Dexterity tests thought to be 
related to job efficiency often were surpassed 
by other dexterity tests which appeared to 
bear little similarity to the jobs in question. 
Dexterity tests which were found to bear 
some relation to successful performance on 


Brown, 


one job often failed to generalize to other 
jobs which appeared to require essentially the 
same abilities. 

Factorial studies of manual dexterity have 
shown that human motor performance cannot 
be accounted for by a single ability 
(Fleishman & Hempel, 1954, 
1940; Seashore, 1951). Rather, the 
suggests the bands of 
specific group factors. This specificity of the 
constituent factors of complex motor abilities 
makes general prediction of job success dif- 
ficult. Motor abilities required for the success- 


factor 
Harrell. 


ey iden e 


1956: 


presence of narrow 


ful performance of jobs of apparently similar 
nature often vary, thereby necessitating the 
validation of present dexterity tests on each 
job by conventional methods. Since validity 
cannot be adequately generalized from one 
situation to another, the test developer is 
forced to tailor-make a testing program for 
each situation with little but an educated 
guess as to what tests will meet the local re- 
quirements. 

Many dexterity tests currently in use meas- 
ure performance of single elementary motions 
or limited combinations of these motions. Be- 
cause of the limited motion patterns utilized, 
present dexterity tests often do not match the 

‘This paper is based on a thesis submitted to the 
faculty of Purdue University in partial fulfillment of 
the requirements for the PhD degree. The research 
was supported by funds granted by the Purdue Re 
search Foundation 
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logy Center, North Carolina State € 
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patterns of motion that characterize many in- 
dustrial jobs. As Griffin (1957) points out, 
dexterity tests of the pegboard type have an 
inefficient layout of the testing task resulting 
from pins, collars, and washers being placed 
in recessed bins instead of arranged in an art 
corresponding to the layout of a benchwork- 
er’s task. 

The purpose of this research is the develop- 
ment of a series of dexterity tests which in- 
corporate many of the motion patterns used 
in bench-assembly It is hypothesized 
that the predictive validity of a test which 
essentially duplicates or simulates the se- 


j ybs 


quence of motion elements used on a job is 
greater than the validity of a test that does 
not intentionally simulate the sequence of 
Since the specificity of 
motor skills has tended to limit the predictive 
tests, the 
simulation of actual motion patterns used on 


motion elements 


validity of conventional dexterity 


the job may offer new possibilities for utiliz- 
ing psychomotor tests as job predic tors. 


METHOD 
velopment 
A system of 


variety of man 
Bex ause 


work elen 
ipulative jobs 1 
actors such as distance 


moved, eas¢ 


I 
size of object 


grasp conceivably might requ 
ities, only systems which includ 
system that best 
Methods-Time Measurement 
tem of predetermined times (Maynard 
& Schwab, 1948). The MTM system utilize 
lowing Reach 
Grasp, Position, Disengage, and Release 
The test design thought to be 
terms of the MTM system was 
conventional pegboard. A pegboard design was 
chosen because the MTM motions of Reach, Grasp 
Position, and Release of object 
present in almost every 


considered. The 


was the 
seven motion elements 
best characterized in 


a modification of the 


being assembled are 
job and are essen 
tially the same motion elements as those involved in 


assembly 


placing pins in a board 
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In order to duplicate the MTM element Position, 
three variables were taken into account: class of fit, 
symmetry, and difficulty of handling of object. Class 
of fit according to MTM definition is determined by 
clearance between pin and hole and pin dimension 


Symmetry between pin and target varies from in 


finite assembly possibilities to a single assembly posi 


tion. Ease of handling is dependent upon size and 
shape of object being assembled 

The effects of were duplicated by de- 
signing boards with three shapes of round, 
square, and pentagonal. The boards round 
holes simulated operations in which a pin could be 
inserted in any of an infinite number of positions, 
boards having square holes represented operations in 
which a pin could be positioned only four ways, and 
boards having pentagonally shaped holes represented 
operations in which a pin could be positioned and 
The latter positional restric 
tion resulted from the fact that all the sides of the 
pentagon were not of equal length 

Variation in the ease of handling was accomplished 
by designing pins of two lengths, one length corre- 


symmetry 
holes 
having 


inserted only one way 


sponding to the length necessary to qualify an object 
as being easy to handle and a longer length sufficient 
as being difficult to handle ac- 
standards 


to qualify an object 
cording to the MTM 

The MTM 
cerning which MTM elements are appropriate, given 
a particular combination of pin’ dimension and clear- 


system has specific requirements con 


Donald W. Drewes 


Since it was not feasible to duplicate all pin 
Appropriately shaped 


ance 
sizes, four sizes were selected 
pins were designed for each board, with the excep- 
tion of the smallest dimension of the pentagonally 
shaped pin and several intermediate dimensions of 
the square which were excluded. The exact 
tolerances made the cost of machining the smallest 
pentagonal pins prohibitive, and productional diffi 
a complete set of 


pins 


culties prevented the building of 
square boards 

The clearance factor was accounted for by design 
ing two boards for certain combinations of pin shape 
and size. One board corresponded to loose tolerance 
and the other board corresponded to tight tolerance 
as specified by the MTM system 

The simulation of bimanual operations was accom- 
plished by designing the boards as trays filled with 
small blocks. In this manner, the blocks could be 
removed from the trays and handled separately 

The experimental test model as designed consisted 
of 14 boards and 18 sets of pins. The complete as 
semblage of boards and pins was entitled the Purdue 
Elemental Motions Tests (PEMT). A dimensional 
description of the PEMT is presented in Table 1 

Each board of the PEMT is essentially 
filled with 30 wooden blocks arranged in three 
rows of 10 blocks each. The outer dimension of each 
tray is 164 X 58 X 18 and the inner dimer 
sion is 15{ x 4 inches. Each block is 
14 X 13 inches with a hole center 


a wooden 


tray 


inches, 
wooden 
inch in the 


rABLE 1 


DIMENSIONAI 


DESCRIPTION OF THE 


PEMT 
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rABLE 2 
OMPOSITE CONTINGENCY TABLI 
\ 72 


Appropriate PEM 1 east 
prop 


Variation 2 “PEM 
Classification | j High Low High 


All Jobs 


056 


O90) 


Assembly Jobs 


0.000 


AN) (1) (4K) 


1s 000 


Each of the boards designated ugh iverage efmicienc\ indices Whenever possibDl 


Table 1) contains blocks whi hosen 


with a triangular insert placed it ( ner extremes of the 
However, on certain 
Reliability Analysis ficiency indices 


available resulted 


tween the indices o 


Reliability estimates were obtained fo 


ing tasks involving various combinations of distance : 

, , who had been on the 
reached, shape of pins, length of pins, and size of d 
1 , excluded from the sampkt 
pins. Four testing tasks were administered to each of . 
ranked for each job 


Four testing 
subjects, and the time taken to complet 


two groups \ 9, 3 mi college students 


rhe time required for the teste¢ insert the first 15 


and last 15 pins was recorded for each task, and de 
F was recorded. Three of the ta were variation 

the PEMT; the fourth was Minnesota Rate of 

Manipulation Test (MRM), a conventional dexterit) 


test. Because of the compl xity of the jobs, two vari 


reliability coefficients were computed 


Validation Procedure 


The validation sample consisted of fen sh itions of the PEMT were administered 
employees of a large communications equipment simulate better the motion patte I 
manutacturing company. Nine jobs were represented each job. These two variations we 
in the sample, with eight workers in each of the nine the Most Appropriate PEMT, Variation A, 
jobs. Each job was so chosen because a high degree Most Appropriate PEMT, Variation B. In ord 
of dexterity appeared to be essential for successful test the hypothesis that validity was a function 
job performance. All of the jobs were bench jobs the similarity between testing task and job 


involving manual rather than machine operations variation of the PEMT, whose motion pattern 
t 
t 


The criterion was an efficiency index which ind purposely mismatched with those of 


he 

cated a worker’s productivity in relation to stand used. This variation was referred to as the 
ards established by the company. When two or more Appropriate PEMT. The fourth test, the turning 
workers on the same job received identical efficiency of the Minnesota Rate of Manipulation Test, was 
ratings, the ties were broken by having the super cluded because it was thought to be representative 
visors rank the tied workers a conventional dexterity test requiring motion pat 

The sample group (AN 8 ym h job con terns of a different type than those required by the 
sisted of four workers having ; average effi PEMT 


ciency indices and four workers he most and least appropriate variations of the 
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PEMT were determined for each job. The selection 
of the testing tasks to be used for each job was de- 
pendent upon the motion elements used in the per 
formance of each job. Particular attention was given 
to such requiring 
assembly, shape and size of objects handled, distances 
reached, and the type of grasp required to handle the 
object. Given this information, the variations of the 
PEMT which appeared to offer the best chance for 
simulation of the job motions were selected. Con- 
versely, the information concerning each job served 
as a basis for the selection of the particular variation 
of the PEMT thought least likely to simulate actual 
job motions. 

The information concerning the motion patterns of 
each job was secured by consultation with the indus- 
trial engineer in charge of setting the standards for 
each of the nine jobs. Since the engineer was familiar 
with the motion patterns of each job, he was able to 
suggest rather unique ways of simulating the actual 
job pattern 


factors as tolerances of objects 


Statistical Analysis 


Median used in the 
relationship between test 


evaluation of the 
and criterion. The 
high criterion and low criterion groups on each job 
classified number 
below the median test for that job. The rela- 
tionship between each of the four tests and the 
criterion was evaluated by combining the individual 
contingency tables for each job into a single con- 
tingency table and computing the phi coefficient. The 


tests were 
scores 
above or 


were according to the 


score 
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same procedure was followed for a subset of the ning 
jobs comprising four jobs which were essentially as 
sembly-type operations 

The subset of four jobs classified as assembly jobs 
generally conformed to the stereotyped conceptual 
ization of bench-assembly jobs. The jobs.classified as 
nonassembly, on the other hand, involved manual 
operations which were not primarily of an assembly 
nature. Several of these jobs, for example, involved 
the manipulation and flexible wires 
Other nonassembly jobs required the inspection and 
adjustment of small electrical units rather than the 
assembly of small parts 

Since two variations of the PEMT 
simulate the motion patterns of each 
means of deriving a composite total score for the 
two tests needed. Although the problem 
analogous to multiple regression in parametric sta 
tistics, no comparable method was known for non 
parametric methods. Therefore, a trial and 
method of determining appropriate weights was used 
The time scores on the Most PEMT 
variations transformed Several 
schemes were tried and phi 
The weighting scheme selected 
one which resulted in the highest coefficient 
the weighted score and the criterion when computed 
over the nine jobs 


soldering of 


used to 


job, 


were 
some 


was was 


error 


Appropriate 


were into z scores 


weighting coefficients 
computed was the 


between 


RESULTS 


The validation results are shown in Tables 
2 and 3. As is readily seen in Table 2, both 


rABLE 3 


COMPOSITE CONTINGENCY TABLES 


) 


ED TO VARIATIONS 1 AND 2 


Weighting Pro 


Variations 1 and 2 Given 


Equal Weighting 


Variation 2 Given 


the Weight ol 


Variation 2 Given Three Times 


the Weight of Variation 1 


at .O5 le 
** Significant at .01 lev 


RESULTING 


Groups 


WEIGHTING PROCEDURES 
PEM 


FROM THREE 
THE Most APPROPRIATE 
All Jobs 


Composite 


Assembly 
Jobs Compos 


Score ite Score 


Criterion 


High 


High 
Low 


10.125** 
563 

3 3 

3 13 

10.125** 


563 
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Variations A and B of the Most Appropriate 
PEMT exhibited significant validities when 
computed for all nine jobs. The Least Appro- 
priate PEMT and the MRM did not exhibit 
a significant relationship between job pro- 
ficiency and test scores when computed over 
all nine jobs. 

When the subset of four assembly-type jobs 
was considered separately, both the most and 
the least appropriate tests of the PEMT were 
found to have significant validity coefficients. 
The MRM failed to show a significant rela- 
tionship between test scores and criterion 
when only the assembly-type jobs were con- 
sidered. None of the four tests had significant 
validity coefficients when computed over the 
five nonassembly jobs. 

Table 3 shows the contingency tables and 
the validity coefficients that resulted when 
different weighting procedures were used to 
combine Variations A and B of the Most 
Appropriate PEMT. Three different weight- 
ing procedures were compared: Variations A 
and B given equal weight, Variation B given 
twice the weight of Variation A, and Varia- 
tion B given three times the weight of Varia- 
tion A. The procedure whereby Variation B 
was given twice the weight of Varation A was 
chosen because it resulted in the highest 
validity when considered over all the nine 


jobs. No other weighting procedures were 
tried because it: was felt that no weighting 
procedure could produce a composite score 
which would have a validity coefficient larger 
than the largest validity coefficient between 
the individual variations of the PEMT and 
the criterion. 


The eight reliability estimates ranged from 
.711 to .900 with a median of .866. Since the 
reliability coefficients were of similar magni- 
tude, they were considered to represent a 
close approximation to the reliability pa- 
rameter. 

DISCUSSION 


It is hypothesized that a series of dexterity 
tests which could be used to simulate a large 
number of motion patterns characterizing 
many assembly-type jobs would result in 
higher predictive validities than would a con- 
ventional dexterity test involving a fixed 
pattern of motion. This hypothesis is gen- 
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erally supported by the results of the valida- 
tion study. The validity coefficients of the 
MRM when computed over all jobs and when 
computed for assembly and nonassembly jobs 
separately were zero in every case. The va- 
lidities of the PEMT, on the other hand, 
appeared to generalize from one job to an- 
other. When evaluated over all nine jobs, 
both variations of the Most Appropriate 
PEMT expressed a significant relationship 
with the criterion. The fact that the relation- 
ship of the Least Appropriate PEMT over all 
nine jobs failed to be significant lent further 
support to the hypothesis. 

When the nine jobs were subdivided into 
four assembly and five nonassembly jobs, the 
Least Appropriate PEMT and both varia- 
tions of the Most Appropriate PEMT were 
found to have significant validities over the 
validities 
The magnitude of 
the validity coefficient for the Least Appropri- 
ate PEMT for the assembly jobs was larger 
than expected, since it was hypothesized that 
the Least Appropriate PEMT would have a 
lower relationship with the criterion than 
would the Most Appropriate PEMT. How- 
ever, this hypothesis is partially supported 
by the fact that the validity for Variation B 
of the Most Appropriate PEMT exceeded 
the validity of the Least Appropriate PEMT 
when computed only for assembly jobs. The 
significance of the difference 
validity coefficients could not 
evaluated 


assembly jobs and nonsignificant 
over the nonassembly jobs 


between these 
be properly 
since no appropriate statistical 
technique known. An_ approximation 
using the standard error of uncorrelated 
tetrachoric correlations computed from data 
dichotomized at the median indicated a sig- 
nificant difference between the validity co- 
efficients. The results, however, were accepted 
with reservation, since tetrachoric correlations 
based on small sample sizes are often un- 
stable. 

Variation B of the Most Appropriate 
PEMT appeared to be a more valid predictor 
of criterion performance than Variation A. 
The higher predictive validity of Variation B 
might possibly be interpreted as resulting 
from a transfer of training or practice effect, 
as Variation A always preceded Variation B 
in the sequence of test administration. If such 


was 
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an effect were operating, the practice received 
on Variation A would transfer to Variation B, 
causing Variation B to represent a more valid 
measure of the true time score than Varia- 
tion A. This possibility is expressed in the 
derived weighting procedure whereby Varia- 
tion B is given twice the weight of Varia- 
tion A. 

A significant result of the validation study 
was the finding that the PEMT could not be 
used effectively to duplicate nonassembly 
operations. If the PEMT is to be effective 
as a predictive instrument, it must be re- 
stricted to assembly-type operations until the 
coverage can be extended to include the 
patterns found on jobs involving 
activities other than the assembly of rigid 
parts into fixed location receptacles. 


motion 


CONCLUSIONS 


The results of this research indicate that 
the predictive utility of psychomotor tests 
may be substantially increased by developing 
tests which will more closely approximate 
actual motion patterns used on the job. 
Instead of concentrating on the macro aspects 
of overall job performance, it may be profit- 
able to divide the job into micro units and 
to develop means of predicting performance 
on appropriate these micro 
units. In this manner, it may be possible to 
synthesize predictors for particular jobs based 
on the microanalysis of the motion patterns 
involved. New may conceivably be 
analyzed and the appropriate selection instru- 
ments designed before the job is actually in 
existence on the shop floor. Selection of tests 


sequences of 


jobs 


on the basis of microanalysis of motion pat- 


terns may substantially reduce the amount 
of guesswork involved in the selection of 
predictive tests by providing the practitioner 
with a formal procedure for test selection. 

The results of this research represent only 
a small start toward meeting the challenge of 
synthetically developing psychomotor tests. 
The conclusions from this study are based 
on the results obtained from one of a vast 
number of industrial enterprises in our so- 
ciety, and are limited at present to bench 
jobs of an assembly nature. Notwithstanding 
these limitations, the approach advocated in 
this research appears to offer new and chal- 
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lenging opportunities for increasing the utility 
of psychomotor tests as predictors of manual 
job performance. 

In summary, the following conclusions ap- 
pear warranted on the basis of the results of 
this study: 


1. The predictive validity of a dexterity 
test designed to simulate a large number of 
motion patterns characterizing assembly-type 
jobs will generally result in higher predictive 
validities than will conventional dexterity 
involving rather different patterns of 
motion. 


tests 


2. The selection of appropriate predictive 
tests on the basis of job motion patterns offers 
the test practitioner a formal procedure for 
the selection of dexterity tests. 

3. The validities of certain tests may be 
generalized to similar job situations without 
the benefit of conventional validation 
cedures. 


pro- 


SUMMARY 


A series of pegboard tests entitled the 
Purdue Elemental Motions (PEMT) 
was designed to incorporate many of the 
motion elements used in the Methods-Time 
Measurement predetermined time system. 
Because the PEMT was thought to offer a 
greater possibility for the simulation of mo- 
tion patterns of manipulative jobs than do 
many conventional dexterity tests, it was hy- 
pothesized that the predictive validity of the 
PEMT would be generally higher than the 
validity of conventional dexterity tests in- 
volving motion which were not 
tailor-made for the individual jobs. It was 
further hypothesized that a subtest of the 
PEMT selected to simulate the pattern of 
motion elements would be a more valid pre- 
dictor of job success than would a subtest 
of the PEMT purposely selected to 
represent the actual motion elements. 

Validation on an industrial sample sup- 
ported the hypotheses. The validity of the 
PEMT exceeded that of the conventional 
dexterity test when generalized over all jobs 
sampled and when generalized over a subset 
of assembly jobs. The validity of the Most 
Appropriate PEMT variations exceeded the 
validity of the Least Appropriate PEMT 
when generalized over all jobs. It was con- 
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elements 


mis- 





cluded that if dexterity tests could be vali- 


dated on 


component sequences 
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of motion 
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motion and 
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patterns, untapped areas of application may 
be uncovered. 
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SITUATIONAL EFFECTS ON A PROJECTIVE 


TEST * 


BERNARD MAUSNER 


Graduate School of Public 


When a psychological test of any kind is 
administered, the degree to which the results 
can be attributed to temporary situational 
effects is always in question. Although this is 
especially true for questionnaires and intel- 
ligence tests, there is some evidence that the 
needs of the moment will affect a TAT-like 
instrument in a predictable way (McClelland, 
Atkinson, Clark, & Lowell, 1953). In fact, 
this effect is made the basis for a measure of 
needs. It can certainly be demonstrated that 
many kinds of tests can be faked by subjects 
following instructions to adopt one or another 
set (cf. references cited by Heron, 1956). 

Demonstrations on a quasi-projective test 
of a situational effect in which powerful 
motives are roused by a stress related to the 
subject’s real life are not as easy to find as 
is simulated faking. The sole example is 
Heron’s study (1956) in which one-half of a 
group of applicants for a position took a 
battery of tests before they were hired and 
the other half afterwards. Systematic differ- 
ences between the responses of the two groups 
were related to the difference in set produced 
by the assumption on the part of the one 
group that the test mattered and on the part 
of the other that it was unimportant. 


1 Many individuals were responsible for the gather- 
ing of the data reported on here. The study in which 
the survey sample was gathered was carried on by 
the Research Division of Psychological Service of 
Pittsburgh under a grant from the Buhl Foundation 
and with the assistance of a number of industrial 
firms in Pittsburgh. The director was 
Frederick I. Herzberg. Scoring of the Rosenzweig 
P-F tests was done by Eva Reinkraut and Edith 
E. Fleming. The organizational drawn 
by Reinkraut. The writer wishes to acknowledge 
gratefully the cooperation of Psychological 
of Pittsburgh in making the data accessible. Sta- 
tistical computations were carried out by Joseph 
Meiri and Elaine Sloan, both of the Graduate School 
of Public Health, University of Pittsburgh. The 
writer is fully both for the statistical 
analyses and the discussion which comprise this 
report. The analyses were done under support from 
Grant M-2836 from the National 
Health. 


research 


sample was 


Service 


responsible 


Institutes of 


Health, University of Pittsburgh 


The present paper reports the comparison 
of two sets of scores on the same test, each 
gathered as part of an activity unrelated to 
the other. This comparison, although it is not 
entirely free of possible ambiguities, throws 
some light on the way in which the results 
of this test were affected by the context in 
which it was taken. 

The Rosenzweig Picture-Frustration Study 
was administered to 203 engineers and ac- 
countants as part of an investigation into 
their attitudes towards their jobs (Herzberg, 
Mausner, & Snyderman, 1959). The test was 
used to examine some of the subtle factors 
in personality which might affect an indi- 
vidual’s reaction to his working situation. 
Subjects were given the test booklet, an 
orientation as to the procedure to be followed, 
and an envelope addressed to the sponsor of 
the research. The test was completed at the 
subject’s leisure and then mailed in. While 
there is no way of knowing where or when the 
work was done, it is likely that most of the 
men filled out the booklets at their 
shortly after the orientation. Since the investi- 
gation of attitudes did not include any meas- 
ures of contemporary feelings, but was re- 
stricted to stories about past periods in which 
the subject was happy or unhappy in his 
work, the attitudinal data give no clues as to 


desks 


the subject’s frame of mind during the ad- 
ministration of the test. One can assume that 
the subject’s set varied randomly since the 
data were gathered in 11 different companies 
over a 6-month period. In the end 154 usable 
protocols were returned. 

Some question arose concerning the pro- 
priety of utilizing the results of so unorthodox 
a procedure. A partial check on the effects of 
the procedure was available since the files of 
the sponsoring included test 
protocols from a men to whom 
the P-F had been routinely given during the 
course of an appraisal procedure. It was pos- 
sible to compare the average scores for the 


organization 
number of 


sample of men in the survey with the norms 
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TABLE 1 


P-F Scores FOR SURVEY SAMPLE OF ENGINEERS AND 


COMPARED TO PUBLISHED NORMS 


FOR MALES 


ACCOUNTANTS 


ROSENZWEIG, 1949) 


Survey 
Sample 
N = 154 


Published 
Norms 


Scoring 
Category 
tf 


M SD VW SD 


13.3 43.8 11.5 
8.3 29.0 6 
95 27.2 8 
7.8 206 7 
11.3 506 10 
10.3 289 9 


derived from these routine administrations. 
To the dismay of the research team, the 
scores from the survey sample differed widely 
on almost every scoring category from the 
organization’s norms. However, a comparison 
of the study sample’s scores with the pub- 
lished norms available for the general popula- 
tion (Rosenzweig, 1949) 
cant difference on only one of the six scores 
(see Table 1). 

Apparently, men in the survey sample re- 
acted to the test in much the same way as 
men in the general population, but the same 
kind of man brought into a consulting or- 
ganization for appraisal reacted differently. 
Ideally, to investigate the difference one 
would need matched groups given the P-F 
under neutral conditions and under appraisal. 
Until such a study could be carried out, it 
was decided to take a closer look at the al- 
ready available data. This was done by com- 
paring the survey population with a group of 
men drawn for this purpose from the records 
of the organization. These groups were 
matched as carefully as the information avail- 
able permitted, within the limits set by the 
number of cases in each population. 


showed a signifi- 


The first step was to withdraw from the 
consulting organization’s files the test proto- 
cols of all men who fit the occupational 
criteria for the survey. These men were engi- 
neers and accountants holding jobs above the 
routine level of clerical or drafting work, but 


187 


below the level of company officers. Many of 
them worked for the same companies as the 
men in the survey. These 60 protocols were 
rescored by the same team of psychologists 
which had administered 
during the survey. 

A comment on the scoring scheme would be 
useful at this point. The P-F consists of 24 
cartoons, each depicting a potentially frus- 
trating situation. The protagonist in the 
cartoon is the frustrated figure. Coming from 
his mouth is an empty baloon. The subject is 
instructed to write into the “what 
the man is saying.” These comments are 
categorized twice. In the first scheme, they 
are identified as extrapunitive (E), intra- 
punitive (I), or impunitive (M). Thus, a 
response indicating aggressive lashing out at 
the source of frustration is scored E; a re- 
which the frustration results in 
aggression turned inwardly is scored I; a 
response which attempts to side-step the issue 


scored the _ tests 


balloon 


sponse in 


of frustration and pretend to its nonexistence, 
is scored M. Each response is also, in the sec- 
ond and parallel scheme, described as showing 
one of three qualities of reaction to frustra- 
tion. The first of these is object dominance 
(O-D), which with the 
circumstances of the frustration. The second 


reflects a concern 
is ego-defensiveness (E-D) in which the em- 
phasis is on the wound to the subject’s self- 
esteem. The third category is need-persistence 
(N-P) which focuses on the necessity for 
solving the problems raised by the frustrating 
circumstances. The Rosenzweig manual 
(Rosenzweig, Fleming, & Clark, 1947) 
used as a Each test was 
scored independently by two psychologists. 


was 
basis for scoring. 
One of these, Edith Fleming, was a member 
of the original group which standardized the 
test. two 
psychologists responsible for the scoring ar- 


After considerable discussion, the 


rived at a consensus for the small number of 
items for which their independent scoring had 
diverged. 

In accordance with Rosenzweig’s original 
procedure, the scores are reported as percent- 
ages. Only tests in which at least 22 of the 
24 responses could be scored were included. 
The percentages are given with the total num- 
ber of scorable responses, 22, 23, and 24, 
respectively, as a base. 





Bernard Mausner 


rABLE 2 rABLE 3 


COMPARISON OF ORGANIZATIONAL AND SURVEY GROUP’S CHARACTERISTICS OF THE Two Groups OF ENGINEERS 


Srx SCORING CATEGORIES AND ACCOUNTANTS WHOSE ROSENZWEIG P-] 
ARE COMPARED 


P-F on Eacu o% SCORES 


Organizational 
Scoring N = 60) N = 1! Organizational Survey 
( ategory Subjects N = 600) NV = 154 
( 
Engineers 
29.04 6.70 sa rebemenes 
27.16 8.68 ‘ Noncollege 
6.64 20.56 7.24 sated 
10.98 50.63 10.39 (ge 
28.97 9.74 : Under 35 
Over 36 


** Significant at .01 le 


scheme; the men who took the test as part 
of an appraisal showed significantly less ag- 


Table 2 shows the comparison of the scores 
from the two samples matched only with 
respect to general occupation. Significant dif- gression (E), 
of the frustration (M); they showed signifi- 


and significantly more evasion 
ferences emerge in both areas in the scoring 
rABLE 4 
COMPARISON OF PAIRS OF SAMPLES FROM ORGANIZATIONAI 
AND SURVEY Groups MATCHED ON ONE VARIABLE 
Er gineers vs. Accountants 


Accountants 


Organizational Surve} 


V=17 \ 71 
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Test 


rABLE 5 


COMPARISON OF PAIRS OF 


AND SURVEY GROUPS 


COLLEGE V 


Survey 


V=75 


Organizational 


V = 51 


(E-D) and 
significantly more problem solving (N-P). 
Unfortunately for the acceptability of these 


cantly less ego-defensiveness 


differences, a look at the demographic charac- 
Table 3) 


teristics of the two samples (see 
shows that, aside from the fact that they are 
all engineers and accountants doing profes- 
sional work, they matched for 
each of the three variables on which informa- 


are poorly 


tion is available. The organizational sample 
has fewer accountants, 
and fewer men past 
the survey. 
further 
first, 


fewer noncollege men, 
35 than the sample in 


unde: taken, 
each of the two samples was 


Two 
In the 
examined separately to note the degree to 


treatments were 


which these scores varied as a function of the 
demographic characteristics of the sample. 
In the second, successive pairs of samples 


matched for each of the demographic vari- 


SAMPLES 


MATCHED ON ON 


FROM ORGANIZATIONAL 
VARIABLE 
NONCOLLEGI 


ables were drawn and the test scores com- 


pared. 

The first 
nificance 
affected 
each of 


treatment had little general sig- 
since unknown 
the distribution of 

the continua for information 
was available. For example, the men over 35 


factors may have 
subjects along 


which 


in the organizational sample may have been 
a deviant group within their age range since 
they were either looking for a job or up for 
promotion or reassignment; the younger men 
their first jobs might 
approximate a 
and accountants 
However, a brief summary of the results of 


being considered for 


more closely representative 


sample of young engineers 


this analysis may be useful to investigators 
who would be able to 
studies of the effects of 
occupation on P-F scores 


carry on systematic 


education, and 


age, 


One general finding may be noted; no dif- 





190 


ferences are found within the samples for the 
scores based on direction of response (E,I,M). 
The scores based on quality (O-D, E-D, 
N-P) do not differ when all engineers are 
compared to all accountants. But quality does 
seem to be affected by age. The trend in both 
samples is for need-persistent or problem 
solving responses to fall off with age. This is 
significant in the survey sample, which in- 
cludes a good many men over 35 (F = 5.15, 
df within = 151, between = 2, p < .01). N-P 
responses are also significantly less common 
among noncollege men in the survey sample 
than among college men (¢ = 3.27, n= 152, 
p< .01); there is also a significant increase 
in this group in the frequency of ego-defen- 
sive reactions (¢ = 2.64, n= 152, p< .01). 
For the organizational sample, only the ac- 
countants show a significant difference be- 
tween men over and those under 35. Here 


Bernard Mausner 


the tendency is for the older men to emphasize 
the frustrating situation in their responses; 
i.e., to show higher O-D scores (¢ = 2.54, 
n= 15, p< .05). 

The results of the second treatment may 
be found in Tables 4, 5, and 6. In this series, 
pairs of subsamples matched on one demo- 
graphic variable were drawn from each 
sample and the means of each of the six 
scores compared by means of a ¢ test. As one 
might expect from the variation within the 
groups described in the first treatment, the 
quality of response to aggression is not sig- 
nificantly different for any of these com- 
parisons. The differences in quality of re- 
sponse (O-D, E-D, N-P) shown in Table 2 
are, therefore, probably due to the poor 
matching of the two major samples. The fact 
that the survey sample shows fewer need- 
persistent responses and more ego-defensive 


TABLE 6 


COMPARISON OF 


ParrS OF SAMPLES FROM ORGANIZATIONAL 


AND SURVEY GRoUPS MATCHED ON ONE VARIABLE 
(Age under 35 vs. Age over 36) 


Age under 35 


Organizational 
Scores V = 42) 


Survey 
N = 43 


or 
( 


37.6 41.9 


14.0 


28.3 


7.9 


* Significant at the 5‘ 
** Significant at the ! 


df = ~) V=18 


Age over 36 


t Organizational Survey 


t 
N = 111 df= x 


44.5 


10 





Situational Effects on a Projective Test 191 


responses than the group of men who took 
the test in the organization’s appraisal pro- 
cedure may well be due to the presence in the 
former of a sizeable body of men over 35 who 
had not gone to college. Why such men should 
react to an imagined frustration with ego- 
defensive and object-dominated reactions 
rather than an attention to the solution of 
the problem is obviously not indicated by the 
data of this research. 

The significant differences in direction of 
response to frustration found in Table 4 and 
the failure to find such differences in the 
analysis of the internal characteristics of the 
two groups supports the contention that the 
situation in which the test was 
responsible. No matter what groups are drawn 
from the two major samples the administra- 
tion of the test during an appraisal results in 
an increase in the frequency of responses 
scored M. For three of the groups, engineers, 
college men, and men over 35, this is balanced 
by a significant decrease in the frequency of 
extrapunitive responses. 


was given 


DISCUSSION 


There are two possible explanations for 
these findings. One is that the men who took 
the test during appraisal were, if not malinger- 
ing, at least trying to present as attractive a 
picture as possible. This would lead them to 
suppress aggressive reactions and replace them 
with bland and meaningless evasions of the 
frustrating circumstances portrayed in the 
cartoons. It would hardly be surprising to 
find that some men are able to follow Whyte’s 
prescriptions for the testee (Whyte. 1956) 
This explanation implies that the men were 
able to penetrate the significance of the test 
and manipulate their responses in a system- 
atic manner. While there is evidence that this 
can be done for questionnaires (Heron, 1956; 
Wesman, 1952) there has previously been no 
evidence that a projective device like the 
P-F is susceptible to such manipulation 

An alternate explanation is that the men 
were affected by the tensions of the appraisal 
in the same way that McClelland’s subjects 
are affected by his manipulation of needs for 
achievement (McClelland, 1953). Certainly, 
anyone under appraisal wants to control the 
impression he makes; this could lead to the 


evasive M response without the subject’s try- 
ing to manipulate the results of the test in any 
systematic way simply through the general 
inhibition of behavior which betrays emotion. 
The fact that there is no increase in the 
socially valued problem solving (N-P) re- 
sponses, argues that the differences are due 
more to situational roused by the ap 
praisal situation than to systematic faking 

If the latter explanation is confirmed by 
further investigation, the P-F might be a 
useful instrument to measure situational sets 
For example, the ratio of impunitive to 
extrapunitive responses could be 
sensitive indicator of the degree to which a 
population felt under surveillance in a study 
in which it was important to obtain inde- 
pendent evidence of the salience of an ap- 
praisal of personality Potentially, 
such uses of this test might turn out to be 
far more valid than its original application 
as a measure of ongoing personality charac- 
teristics. 


sets 


used as a 


factors 


SUMMARY AND CONCLUSIONS 
The Rosenzweig Picture-Frustration Study 
was administered to 154 engineers and ac- 
countants. The test booklets of a comparable 
sample of 60 engineers and accountants who 
had taken the 
appraisal procedure in 


Rosenzweig as part of an 
a psychological con- 
sulting organization were scored in an equiva 
lent manner to those of the survey 

Information concerning age and education 
was available for these subjects. When the 
variations in scores on the bases of these two 
variables taken 
found that there was a significant difference 
between survey and organizational samples 


sample 


were into account, it was 


only in one of the two scoring proce lures 

Individuals took the 
Picture-Frustration test as part of an assess- 
ment procedure showed a significantly higher 


who Rosenzweig 


tendency to give responses which were scored 
as impunitive than the group which took the 
test anonymously. That is, they tended to 
avoid the overt expression of hostility, and 
to substitute for it statements evading or 
denying the existence of frustration 

Two explanations were suggested for this 
finding. One was that conscious faking on the 


part of people in the organizational sample 
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resulted in an attempt to present as favorable 
as possible a view of themselves. The other 
was that the dispositional sets roused by the 
appraisal situation succeeded, without the 
subject’s awareness of the significance of 


his responses, in depressing expressions of 


emotion. 
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THE LIMITING HAND SKIN TEMPERATURE 


FOR 


UNAFFECTED MANUAL PERFORMANCE 
IN THE COLD 


R. ERNEST CLARK 


Quartermaster Research and Engineering Command, Natick, Ma 


The data of several studies that 
manual performance is first affected by cold 
exposure somewhere between 55°F and 65°F 
hand skin temperature (e.g., Clark, 1959; 
Clark & Cohen, 1960; Gaydos, 1958; Gaydos 
& Dusek, 


suggestion 


suggest 


1958). However, this is only a 
these investigations were 
designed to suit other experimental interests. 

The purpose of the present study was to 
establish the lower limit of hand skin tem- 
perature (HST) for unaffected manual per- 
formance, and to determine the stability of 
this limiting temperature when duration of 
exposure is varied. On the basis of data re- 
ported by Gaydos (1958), and Gaydos and 
Dusek (1958), 60°F HST 
the possible limiting skin 


since 


studied as 
temperature for 
unaffected performance in the cold, and 55°F 
HST as the skin temperature initially associ- 
ated with severe cold affect. 


was 


METHODS 


Twelve white enlisted men, dressed in shorts and 


shoes, were exposed to a constant ambient tempera 
ture of 70°F and a 50‘ 


humidity 
accomplished by the 
of S’s hands to 10°F air within a refrigera 
Hand skin knot-tying 
performance measured as in the Clark and 
Cohen study 
The experimental 


relative 


ol 5 r 


Localized hand cooling was 
exposure 
tion box temperature and 
were 
1960) 
period lasted 4 days. Before 
cooling each day, S practiced tving five sets of 15 
knots. S’s HST was then to 90°F with a 
heated muff and his hands inserted immediately into 
tk Experimental 


ne cooling box 
obtained on five different occasions during the 


raised 
periormance times 
were 
cooling process: (a) upon entrance of S’s hands into 
HST had 
the appropriate criterion temperature, 
minutes’ 
minutes’ 


the cooling unit; (6b) when his fallen to 
referred to as 
zero exposure at criterion; (c) alter 
exposure at the criterion HST; (d) after 4( 
minutes’ exposure; and finally, (¢) after 60 
exposure 

On Days 1 and 2 of the 4-day experimental session, 
the criterion HST for half of the Ss was 55°F, and 
for the other half 60°F. On Days 3 and 4 the 
criterion HSTs at which performance was measured 


were reversed to exclude practice 


minutes’ 


bias from the data 


acnhuse 


Although performance was always measured at the 
specific criterion temperatures, HST was permitted 
to vary 4°F from the criterion during the 1-hour 
exposure period. Thus, when S’s HST had fallen 4°F 
below criterion, his hands were withdrawn from the 
cooling unit and were exposed to the 70°F ambient 
When his HST had risen 4°F 
criterion, his hands were reinserted into the cooling 
box. It should be noted that the ranges of 55 

4°F and 60 4°F actually overlap 


temperature above 


RESULTS 


All scores were adjusted for initial, pre- 
experimental, performance level by subtract- 
ing S’s scores obtained at 90°F (HST) from 
each of his succeeding scores on a given test 
shown in 


HST 


day. These deviation scores are 
Figure 1 as joint functions of 
exposure duration 


and 


Analyses of variance of the adjusted data 


_—_——— © 
* 


55% 4°F HST 


a ae. 


Lo) Le) 


aptleomnetesihe HST 





& PERFORMANCE-TIME FROM THAT AT 90°F HST (sec) 





} 20 40 60 
EXPOSURE TIME AT CRITERION HST (min) 





Fic. 1 


functions ol 


Changes (A) in manual 
hand skin temperature and 


(Positive changes are 


perlormance as 
duration 


of cold exposure decrements.) 
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indicated significant (p < .001) main effects 
of HST and duration of exposure, but no sig- 
nificant interactions between the experimental 
variables (p’s > .10). HSTs of 55°F were 
consistently associated with performance 
decrements and these decrements increased 
over exposure duration, becoming asymptotic 
after about 40 minutes of exposure. In con- 
trast, performance at 60°F HST was never 
significantly different from that at 90°F HST, 
even though duration of exposure influenced 
performance somewhat at this skin tempera- 
ture level. 

To determine the stability of these findings 
for different groups of Ss, the total subject 
sample was divided into two groups of six 
Ss and the data of each group were analyzed 
for replication differences. None were found. 
Essentially the same effects of HST and 
duration of exposure occurred in both halves 
of the subject sample. 


DISCUSSION 


The present data suggest quite unequivo- 
cally that the HST at 60°F is not associated 
with performance hindrance due to cold 
exposure when tasks similar to the present 
one are used (tasks requiring much joint 
movement). In addition, critical performance 
decrements may be expected when HST falls 
5°F below this level, ie., to 55°F HST. 
These findings remained unaltered by ex- 
posure duration and were completely sup- 
ported by two samples of Ss. 

Presumably, continuous function 
passing from no affect to severe exists be- 
tween the HSTs studied here, but the deter- 
mination of the function would be extremely 
difficult due to performance variability. Fur- 
thermore, a finer difference than 5°F between 
criterion HSTs would probably be unreason- 
able because of the need to use HST ranges 
to accomplish prolonged exposure periods 

Considering the findings of Clark and 
Cohen (1960), it should be noted that the 
present data for performance at 55°F HST 
could have been achieved only with a 
“medium” rate of hand cooling, that is, the 


some 


cooling rate normally associated with expos- * 


ing bare hands to air temperatures around 
10°F. Very rapid hand cooling (exposure to, 
say, subzero air) could have permitted surface 
hand temperatures to drop to criterion levels 
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before internal hand temperatures had been 
sufficiently lowered to hinder performance. 
Thus, the curve in Figure 1 for performance 
change at 55°F HST would have begun at 
the zero (no change) line, showing perform- 
ance decrement only later in the exposure 
period. Very slow hand cooling (exposure to 
20°F air or higher) could have negated the 
apparent influence of the present duration of 
exposure variable since internal hand tem- 
peratures might have become asymptotic be- 
fore performance was first tested at the 55°F 
HST criterion. In the latter case, the 55°F 
HST curve in Figure 1 would have appeared 
as a straight line displaced above and parallel 
to the zero line, illustrating a constant per- 
formance decrement across the exposure 
period. 
SUMMARY 

The hands of 12 enlisted men were cooled 
to 55°F and 60°F surface temperature on 
different experimental days. Performance 


times to complete a standard knot-tying task 
were obtained when S’s hands first reached 
the appropriate hand skin temperature, after 
20 minutes’ exposure at the criterion tempera- 


ture, after 40 minutes’ exposure, and after 60 
minutes’ exposure. 

It was found that performance was severely 
hindered when hand skin temperature fell to 
55°F, and that performance decrements at 
this skin temperature level were increasing 
exponential functions of duration of exposure, 
becoming asymptotic after about 40 minutes’ 
exposure. In contrast, performance at 60°F 
hand skin temperature remained unaffected 
throughout the exposure period 
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A COMPARISON OF ONE-, 


TWO-, AND THREE-MAN 


WORK UNITS UNDER VARIOUS CONDITIONS 
OF WORK LOAD’ 


J. S. KIDD 


Ohio State University 


One of the more persistent problems of 
management is that of dealing effectively with 
fluctuating work loads when there is a fixed 
equipment facility. Some degree of flexibility 
in over-all capacity has been thought to be 
possible if the operating crew could be aug- 
mented under peak load conditions. Recent 
evidence from studies of the performance of 
small groups or teams * (Kidd, 1958; Kinkade 
& Kidd, 1958; Moore & Anderson, 1954; 
Versace, 1956), however, has cast some doubt 
on the efficiency of crew augmentation, per se, 
as a device to increase system capacity. The 
consensus of these reports has been that if a 
unitary * task is distributed among more than 
one operator, the gain in performance, if any, 
is disappointingly slight. That is, if a single 
person could handle a task adequately under 
moderate input load conditions, adding one or 
two helpers does far less than double or triple 
input load capacity. This result seems most 
pronounced when decision making activity is 
required, as opposed to more routine tasks. 

The purpose of the present study was to de- 
termine the effect of crew augmentation upon 
the performance of the particularly complex 
task of radar air traffic control. An additional 
consideration was the relationship between 
team performance and the personnel compo- 
sition of the team. The question here was the 
extent to which performance could be pre- 
dicted on the basis of some aspect of the indi- 
vidually measured performance of the con- 
stituent members. It said that a 


has been 


This research was carried out in the Laboratory 
of Aviation Psychology 
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2 The term “unit” is employed in the title of this 
paper to avoid the anomaly of having a “one-man 
team.” 

Normally suitable to a single-man operation 


Force under 


group is more than the simple sum of its 
members (Warriner, 1956). Various proposi- 
tions have been advanced to give more precise 
meaning to this statement. For example, Sim- 
mel (Wolff, 1950, pp. 26-36) speculates that 
groups tend to perform at the level of the 
poorest member. Kidd (1958), in a different 
context, has suggested that the group might 
act to inhibit the activity of the poorer mem- 
ber and thus allow the better member to con- 
tribute proportionately more to the ultimate 
group output. Fiedler (1954) and others have 
suggested that the perceived similarity of 
member characteristics is among the deter- 
minants. Finally, there is the possibility that 
the simple average of the individual members 
may provide the best prediction. The present 
study attempted a partial evaluation of these 
alternative propositions 


METHOD 
{pparatus, Task Setting, and Subje 


eral task environment was pr 
tion within the 


ovided by the simula- 
laboratory of a radar landing-ap 
proach control center. The imple 
mented by the specially developed OSU Electronic 
Air Traffic Control Simulator (Hixson, Harter, War 
ren, & Cowan, 1954) built 
around an analog computer, is capable of generating 
up to 30 dynamic aircraft targets and presenting 
them realistically to the radar controller 
ray tube display. Direct manipulation of the “air 
craft” is accomplished by college students trained to 
faithfully carry out pilot functions. In addition to 
the visual display of aircraft available to 
the controller, he is in direct auditory communica 
tion with the “pilots” under his jurisdiction through 
simulated radio channels 

The task of the is the 
guidance of the simulated aircraft through the pre 
liminary phases of a landing approach. This itivolves 
the pickup and acknowledgment of aircraft entering 
a specified zone of responsibility, guidance of their 
flight course over a 50-mile approach route, altitude 
and airspeed adjustment prerequisite to actual land- 
ing, and positioning of the aircraft for acceptance by 
a subsequent control agency for the 


simulation was 


This device, which is 


via a cathode 


position 


controller or control te 


final phase of 
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the landing process. The controller is also responsible 
for coordination of departure clearances 

In the present study, the zone of responsibility was 
of constant extent. When a single operator was ac 
tive, he was responsible for the total area. It was 
divided in equal segments when a two- or three-man 
team was employed. 

Nine laboratory trained controllers participated as 
subjects. All had approximately 6 months’ experi- 
ence in the control task at the time the study was 
initiated 

Experimental Variables and Statistical Design. Two 
independent variables were evaluated in this experi- 
ment: the size of the control team or unit and the 
level of input load. Three different unit sizes were 
compared: a single-man operation, a two-man op- 
eration, and a three-man operation. Input rate per 
controller sampled at one aircraft arrival every 90 
seconds (on the average), one arrival every 60 
seconds, and one aircraft arrival every 30 

The two variables were combined factorially, but 
only six of the nine possible combinations of condi 
tions were actually tested. Table 1 illustrates the de- 
sign in graphic form. It was apparent that a com- 
plete test of all possible combinations was not mean- 
ingful nor even feasible within the context of the 
present experiment. Thus, conditions yielding redun- 
dant information were dropped as were those wherein 
the over-all system input rate would have exceeded 
the capacity of the system for sustained operation 
The six remaining cells or combinations of conditions 
provided an opportunity to determine most 
tively the effects and coeffects of the two major 
variables. The design is particularly advantageous in 
that it provided for a comparison of input rate con 
ditions in terms of both input to the controller and 
input to the total system. Thus, across the center 
row, the input per controller was constant (input 
load per controller is one aircraft every 60 seconds), 
while the input to the total system varied directly 
as a function of the number of controllers. Along the 
intact diagonal of Table 1, however, input to the 


seconds 


effec- 


total system is held constant while input per con- 
troller varies inversely with the number of controllers 
in the system. Interaction effects were subject to test 


TABLE 1 


FACTORIAL DESIGN OF EXPERIMENT 


Input Interval Size of Control Unit 
per Controller 
in Seconds One-Man 


['wo-Men Three-Men 


90 90 /908 90/30 
60 60/60 ; 60/20 
x0 


®* Number to put load on 
ot input interv: in seconds; nu 
} 


load to total sy 


ontroller in terms 
1umber to right indicates input 

. in order that each man of a three 
urrival every 90 seconds, the whole systen 
receive one arrival every 30 sec« ) 


maiked by a dash indicate conditions not included 
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comparison of four conditions: namely, 90 
seconds/one controller, 60 seconds/one controller, 90 
seconds/three controllers, and 60 seconds/three con- 
trollers 

Procedure. Each problem consisted of the arrival 
of 24 aircraft. Three of these were at an intermedi 
ate stage of approach at the time the problem was 
begun. Problem duration was approximately 
utes with a 10-minute interval between problems 
Six problems were included in a single session. Four 
sessions were completed each week 


by a 


30 min 


and condi 
program. The 
steps in the development of the total program were 
as follows: (a) each experimental condition appeared 
once in each session and the order of conditions in 
the session was random; (b) the sessions 
ranged in a 


Assignment of controllers to sessions 


tions was accomplished in a_ cyclic 


were ar 
was assigned 
random selection; and (d) the 
other controllers needed to fill out the and 
three-man team conditions were drawn from ad 
jacent sessions so that, for example, the second man 
in the two-man team was always the controller as 
signed to the next succeeding session. This procedure 
was employed to minimize differential practice effects 
and to insure maximum participant utilization. The 
major advantage of the method of scheduling was 
that during the total experiment each participating 
controller was active in each experimental condition 
and at each control position 
cluded once in each session 


series; (c) one controller 


to each session by 


two- 


Each condition was in 


Measures of Performance. Two types of response 
measures were utilized in this experiment to evaluate 
the effect of the experimental variables 

1. The first major measurement category was sys 
included percent delay pet 
mean fuel consumption per aircraft, 
number of missed approaches Mean delay per ait 
craft calculated by 
theoretical flight time, 
served flight time, and 
flight time. This delay 
all the aircraft in 
delay 


tem efficiency. It mean 


aircraft, and 
determining the minimum 
subtracting it from the ob 
dividing by the minimum 


was then averaged for 


was 


score 
a problem, giving a mean percent 
score for each problem. Estimated fuel con 
sumption in pounds computed separately for 
each aircraft on the hypothetical 
which took into account three factors 
airspeed, and altitude 


was 
basis of curves 
aircraft type, 
2. The second major system performance category 
was safety. Separation errors were tallied during each 
problem by a monitor-observer stationed in the 
simulated control center. A separation error was de 
fined as the approach of one aircraft within 3( 
seconds’ flight another aircraft. This means 
that as much as 6-mile lateral distance or 6,000—8,00( 
feet of altitude separation during descent 
quired in the control area 


time of 
was re 
In addition to safety in the airspace of the control 


zone, runway 
a criterion of 


was also introduced as 
performance. A 
error was scored if a 60-second safety 


surface salety 
runway separation 
interval was 
not maintained subsequent to the departure clearance 
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RESULTS 


The data were recorded from a total of 54 
problems. The over-all performance of the 
relatively novice, laboratory trained control- 
lers resulted in an average of 45 aircraft move- 
ments per hour; approximately 34 simulated 
landings and 11 departure clearances made up 
this total. This was equivalent to one aircraft 
movement every 80 seconds, on the average. 
The performance of the nine controllers as 
individuals was relatively homogeneous. The 
best controller had an average delay of 76% 
across the three conditions of input load, while 
the poorest had an average delay of 91%. 

Tables 2, 3, and 4 summarize the findings 
of this experiment with regard to the two 
main effects. As shown in Table 2, the effect 
of input load on a single controller’s perform- 
ance was quite pronounced over the range 
of values sampled, using mean percent delay 
and mean percent excess fuel consumption as 
the criteria. In line with observations made 
in previous studies in this series (Schipper, 
Versace, Kraft, & McGuire, 1956; 
1956), performance decrement was sharply 
accelerated when the interval be- 
tween aircraft drops below 60 seconds. While 
the other indices of performance in Table 2 
tend to support the above relationship, the 


Versace, 


average 


differences between experimental conditions 
were not statistically reliable, as determined 
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by the nonparametric x, test (Siegel, 1956). 
This lack of statistical reliability is at least 
partially attributable to the low over-all fre- 
quency of occurrence of a measurable event 
such as a missed approach. 

Table 3 presents a comparison of various 
sized control units where the input to the to- 
tal system was held constant. Since the result- 
ant input load per controller decreased pro- 
portionately as unit size was increased, a rea- 
sonable expectation would be a progressive 
improvement in performance with the larger 
teams. The observations in Table 3 failed to 
verify such an expectation. There was some 
small improvement as load per controller was 
moderated, but this change was not enough 
to allow the rejection of the null hypothesis 

The data summarized in Table 4 indicate 
that when load per controller was held con- 
stant (resulting in an increasing total system 


load when control unit size is increased) per- 


formance was subject to a progressive decline 
with increasing unit (team) size, a decrement 
which was statistically reliable for the meas- 
ures of mean percent delay and fuel consump- 
tion. With the other indices in Table 4, these 
differences were not statistically significant 
There was no observable interaction between 
control unit (team) size and input load. 

It is necessary to re-emphasize several pe- 
culiarities of the experimental design at this 


PABLE 2 


THE EFrrect 


Mean percent 


Mean percent 


Missed approaches per aircraft processed 


Mean d 


partures | 
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TABLE 3 


THE RELATIVE PERFORMANCE 
CONSTANT INPUT 


Criteria 


Mean percent delay 

Mean percent excess fuel 

Separation errors per aircraft processe« 
Airborne 
Runway 

Missed approaches per aircraft processed 


Mean departures permitted per 30-minute problem 


*® The Krushal-Wallis one-way analysis of variance 
b The relatively high frequency 


point to help in the evaluation of the results. 
As has been indicated previously, it was not 
possible to vary team size without the occur- 
rence of a concomitant shift in input load 
characteristics either with regard to the sys- 
tem or to the individual operators. The solu- 
tion of this dilemma employed by the present 
study was to explore both major alternatives, 
constant load per system/varying load per 
controller, and varying load per system/con- 
stant load per controller. Even with the use 
of this technique, however, there remains the 
inevitable confounding, and any conclusions 
must be qualified by this fact. 

The predictability of team performance on 
the basis of individual scores 
using four predictor variables. The variables 


was assessed 


oF Various SizED CONTROL | 
LOAD TO THE 


test was employed 
of zero cell entries for this conditior 


NITS WITH A 


SYSTEM 


One-Man I'wo-Man Three-Man 


129.6 122.6 122.5 


146.8 134.4 126.8 


Siegel, 1956) 
erely limits the usefulness 


used were the average score of the constituent 
members, the better (best) member’s score, 
the poorer (poorest) member’s score, and the 
variability (range) of the constituent mem- 
bers’ individual scores. No reliable prediction 
of the team score was obtained. 

A final consideration was controller com- 
munication activity. Each problem was sam- 
pled for the proportion of the controller’s time 
spent in communication with pilots and the 
proportion of his time utilized in communi- 
cation with fellow controller(s). It was ob- 
served that in the three-man teams nearly 
30% of the center controller’s communication 
time was spent talking to the other controllers. 


These proportions represent only a single con- 


troller’s talk time to other members of the 


rABLE 4 


HE RELATIVE PERFORMANCE OF 


CONSTANT INPUT 


Mean percent delay 
Mean percent excess fuel 
Separation errors per aircratt 1 
Airborne 
Runway 
Missed approaches per aircraft processed 


Mean departures permitted per 30-minute problem 


Various SizEp CONTROL | 


LOAD 


rO THE CONTROLLER 


One-Mar vo-Man Man 


rhree 


122.6 132.6 


134.4 146.0 
092 063 
052 


063 


033 042 


59 
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pattern-feeder team and do not include either 
the duration of the reply (i.e., the control- 
ler’s “output” only was included) or com- 
munications to or from those who played 
ancillary roles in the control center simulation. 


DISCUSSION 


The results of this experiment indicate that 
simple crew augmentation does not necessarily 
improve the capacity of a complex man-ma- 
chine system. These findings fit within a grow- 
ing body of data on group and team perform- 
ance that now covers a considerable variety 
of task dimensions. The finding that 
group productivity is proportionately inferior 
to individual productivity has not been con- 
tradicted. The mechanism behind this finding 
has become progressively more apparent; in- 
trinsic to team performance is the require- 
ment of coordination. This requirement is su- 
perimposed on the normal demands of the 
task itself and leads to a proportionate re- 
duction of exclusively task-directed behavior. 

It is clear that there are practical limits 
on the applicability of such a generalization. 
In the first place, there exist numerous task 
situations which, because of the variety of op- 
erations involved, their scope, and duration, 
preclude autonomous individual activity. It is 
possible also that there is a motivationally 
beneficial by-product of integrated group ac- 
tivity that leads to continuity of group effort 
(Bavelas, 1953). This factor can have great 


basic 


practical significance in those instances where 


voluntary participation, rather than arbitrary 
assignment to a group, is the rule. Further- 
more, while the processing rate of the work 
unit is not increased proportionately as team 
size is increased, there is good reason to be 
lieve that the larger units do have an increased 
ability to fulfill a buffer storage function in 
the system. Previous studies in this series 
(Kidd & Hooper, 1958; Kidd & Kinkade, 
1958) have shown that the most sensitive 
measure of load is the number of aircraft un- 
der control rather than input rate, per se. 
Since the data from the present study derive 
from a manipulation of input rate, no direct 
statement regarding ‘number of aircraft un- 
der control” is warranted. However, on a sup- 
positional level, it can be proposed that the 
major residual advantage of the larger work 


units for short-term system effectiveness i 
the increased temporary storage capacity that 
they provide. 

Nevertheless, it seems clear that a system 
design and system management which mini- 
mizes interoperator coordination and integra- 
tion demands will yield superior performance. 
It now becomes the problem of future investi- 
gations to determine precise techniques for 
work load allocation that lead to maximum 
operator autonomy but do not necessarily ex- 
clude all opportunities for operator interac- 
tion with its potential benefit to morale. 

The somewhat paradoxical inferiority of the 
group as compared to the individual raises 
again the issue of the contribution of each 
constituent member to the team. The results 
of the attempted prediction of group from in- 
dividual performance undertaken in this ex 
periment were negative and therefore must re 
main inconclusive. It is quite apparent, how- 
ever, that the disappointingly low output of 
the group is not due simply to limits imposed 
by the poorest member. 


SUMMARY 


In this study a comparative evaluation was 
made of the effect of input load and team size 
on the productivity of a radar approach con- 
trol unit. The context was a simulated radar 
approach control center, and the task as- 
signed was that of pattern-feeder controller 

Nine laboratory trained controllers partici- 
pated in a total of 54 problems. Input load 
was varied by spacing the interval between 
aircraft arrivals at either 90 seconds, 60 sec- 
onds, or on the average. Control 
unit size was varied by using one 


30 seconds. 
two, and 
three operators per unit. 

Results regarding input load confirmed 
previous findings that performance falls off 
sharply as load is inceased. When input load 
to the system was held constant and the con 
trol unit size was increased, leading to a de- 
crease in load per controller, performance was 
upgraded only moderately. When input load 
to the system was increased proportionately to 
the increase in team size, resulting in a con 
stant load per controller 
performance was markedly diminished in the 
multiman units. 

No reliable prediction of team performance 


across conditions, 
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was observed on the basis of four predictor 
variables derived from individual performance 


indices. 

The tentative conclusions were that maxi- 
mum performance can be attained from multi- 
man system operations when the coordination 


demands are minimized. A reservation im- 
posed on this conclusion was suggested from 
other cited research which has indicated that 
complete functional isolation or autonomy 
may have deleterious motivational effects in 
the long run. 
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